Fetching A Lot Of Urls In Python With Google App Engine
Solution 1:
It's failing because you only have 60 seconds to return a response to the user and I'm going to guess it's taking longer then that.
You will want to use this: https://cloud.google.com/appengine/articles/deferred
to create a task that has a 10 minute time out. Then you can return instantly to the user and they can "pick up" the results at a later time via another handler (that you create). If collecting all the URLs takes longer then 10 minutes you'll have to split them up into further tasks.
See this: https://cloud.google.com/appengine/articles/deadlineexceedederrors
to understand why you cannot go longer then 60 seconds.
Solution 2:
Edit: Might come from Appengine quotas and limits. Sorry for previous answer:
As this looks like a protection from server for avoiding ddos or scrapping from one client. You have few options:
Waiting between a certain number of queries before continuing.
Making request from several clients who has different IP address and sending information back to your main script (might be costly to rent different server for this..).
You could also watch if website as api to access the data you need.
You should also take care as the sitowner could block/blacklist your IP if he decides your request are not good.
Post a Comment for "Fetching A Lot Of Urls In Python With Google App Engine"