Skip to content Skip to sidebar Skip to footer

Scrapy Works Fine Until Page 12 Of Asp Site, Then 500 Error

My first scraping project with Python/Scrapy. Site is http://pabigtrees.com/ with 78 pages and 20 items (trees) per page. This is the full spider with a few changes to provide a mi

Solution 1:

The __VIEWSTATE is indeed what is causing you trouble.

If you take a look at the navigation of the site you're trying to scrape, you'll see it only links to 10 other pages:

navigation

Those are the only 10 links of this search you're allowed to access from the current page (with the current view state). The next 10 will be accessible from page 11 of the search.

One possible solution would be to check in parse_page() if you're on page 11 (or 21, or 31...), and if so, create the requests for the next 10 pages.

Also, you only need to populate the formdata you want to change, FormRequest.from_response() will take care of the ones available in hidden input fields, such as e.g. __VIEWSTATE or __EVENTVALIDATION.

Post a Comment for "Scrapy Works Fine Until Page 12 Of Asp Site, Then 500 Error"