Skip to content Skip to sidebar Skip to footer

Fetching A Lot Of Urls In Python With Google App Engine

In my subclass of RequestHandler, I am trying to fetch range of urls: class GetStats(webapp2.RequestHandler): def post(self): lastpage = 50 for page in range(1, la

Solution 1:

It's failing because you only have 60 seconds to return a response to the user and I'm going to guess it's taking longer then that.

You will want to use this: https://cloud.google.com/appengine/articles/deferred

to create a task that has a 10 minute time out. Then you can return instantly to the user and they can "pick up" the results at a later time via another handler (that you create). If collecting all the URLs takes longer then 10 minutes you'll have to split them up into further tasks.

See this: https://cloud.google.com/appengine/articles/deadlineexceedederrors

to understand why you cannot go longer then 60 seconds.

Solution 2:

Edit: Might come from Appengine quotas and limits. Sorry for previous answer:

As this looks like a protection from server for avoiding ddos or scrapping from one client. You have few options:

  • Waiting between a certain number of queries before continuing.

  • Making request from several clients who has different IP address and sending information back to your main script (might be costly to rent different server for this..).

  • You could also watch if website as api to access the data you need.

You should also take care as the sitowner could block/blacklist your IP if he decides your request are not good.

Post a Comment for "Fetching A Lot Of Urls In Python With Google App Engine"