Scrapy Not Crawling Subsequent Pages In Order
I am writing a crawler to get the names of items from an website. The website has got 25 items per page and multiple pages (200 for some item types). Here is the code: from scrapy
Solution 1:
scrapy is an asynchronous framework. It uses non-blocking IO, so it doesn't wait for a request to finish before starting the next one.
And since multiple requests can be made at a time, it is impossible to know the exact order the parse()
method will be getting the responses.
My point is, scrapy is not meant to extract data in a particular order. If you absolutely need to preserve order, there are some ideas here: Scrapy Crawl URLs in Order
Post a Comment for "Scrapy Not Crawling Subsequent Pages In Order"