Why Isn't Xmlfeedspider Failing To Iterate Through The Designated Nodes?
I'm trying to parse through PLoS's RSS feed to pick up new publications. The RSS feed is located here. Below is my spider: from scrapy.contrib.spiders import XMLFeedSpider class
Solution 1:
You need to handle namespaces:
classPLoSSpider(XMLFeedSpider):
name = "plos"
namespaces = [('atom', 'http://www.w3.org/2005/Atom')]
itertag = 'atom:entry'
iterator = 'xml' # thisis also important
See also:
Working example:
from scrapy.contrib.spiders import XMLFeedSpider
classPLoSSpider(XMLFeedSpider):
name = "plos"
namespaces = [('atom', 'http://www.w3.org/2005/Atom')]
itertag = 'atom:entry'
iterator = 'xml'
allowed_domains = ["plosone.org"]
start_urls = [
('http://www.plosone.org/article/feed/search''?unformattedQuery=*%3A*&sort=Date%2C+newest+first')
]
defparse_node(self, response, node):
print node
Post a Comment for "Why Isn't Xmlfeedspider Failing To Iterate Through The Designated Nodes?"