Skip to content Skip to sidebar Skip to footer

Python Xml Iterating Over Elements Takes A Lot Of Memory

I have some very big XML files (around ~100-150 MB each). One element in my XML is M (for member), which is a child of HH (household) - i.e. - each household contains one or more

Solution 1:

etree is going to consume a lot of memory (yes, even with iterparse()), and sax is really clunky. However, pulldom to the rescue!

from xml.dom import pulldom
doc = pulldom.parse('large.xml')
forevent, node in doc:
    ifevent == pulldom.START_ELEMENT and node.tagName == 'special': 
        # Node is 'empty' here       
        doc.expandNode(node)
        # Now we got it allifis_valid_hh(node):
            ...do things...

It's one of those libraries no one who did not have to use it seems to know about. Docs at e.g. https://docs.python.org/3.7/library/xml.dom.pulldom.html

Post a Comment for "Python Xml Iterating Over Elements Takes A Lot Of Memory"