Skip to content Skip to sidebar Skip to footer

Why Isn't Xmlfeedspider Failing To Iterate Through The Designated Nodes?

I'm trying to parse through PLoS's RSS feed to pick up new publications. The RSS feed is located here. Below is my spider: from scrapy.contrib.spiders import XMLFeedSpider class

Solution 1:

You need to handle namespaces:

classPLoSSpider(XMLFeedSpider):
    name = "plos"

    namespaces = [('atom', 'http://www.w3.org/2005/Atom')]
    itertag = 'atom:entry'
    iterator = 'xml'  # thisis also important

See also:

Working example:

from scrapy.contrib.spiders import XMLFeedSpider


classPLoSSpider(XMLFeedSpider):
    name = "plos"

    namespaces = [('atom', 'http://www.w3.org/2005/Atom')]
    itertag = 'atom:entry'
    iterator = 'xml'

    allowed_domains = ["plosone.org"]
    start_urls = [
         ('http://www.plosone.org/article/feed/search''?unformattedQuery=*%3A*&sort=Date%2C+newest+first')
    ]

    defparse_node(self, response, node):
        print node

Post a Comment for "Why Isn't Xmlfeedspider Failing To Iterate Through The Designated Nodes?"