Skip to content Skip to sidebar Skip to footer

Scrapy Not Crawling Subsequent Pages In Order

I am writing a crawler to get the names of items from an website. The website has got 25 items per page and multiple pages (200 for some item types). Here is the code: from scrapy

Solution 1:

scrapy is an asynchronous framework. It uses non-blocking IO, so it doesn't wait for a request to finish before starting the next one.

And since multiple requests can be made at a time, it is impossible to know the exact order the parse() method will be getting the responses.

My point is, scrapy is not meant to extract data in a particular order. If you absolutely need to preserve order, there are some ideas here: Scrapy Crawl URLs in Order

Post a Comment for "Scrapy Not Crawling Subsequent Pages In Order"