Python Xml Iterating Over Elements Takes A Lot Of Memory
I have some very big XML files (around ~100-150 MB each). One element in my XML is M (for member), which is a child of HH (household) - i.e. - each household contains one or more
Solution 1:
etree
is going to consume a lot of memory (yes, even with iterparse()
), and sax
is really clunky. However, pulldom
to the rescue!
from xml.dom import pulldom
doc = pulldom.parse('large.xml')
forevent, node in doc:
ifevent == pulldom.START_ELEMENT and node.tagName == 'special':
# Node is 'empty' here
doc.expandNode(node)
# Now we got it allifis_valid_hh(node):
...do things...
It's one of those libraries no one who did not have to use it seems to know about. Docs at e.g. https://docs.python.org/3.7/library/xml.dom.pulldom.html
Post a Comment for "Python Xml Iterating Over Elements Takes A Lot Of Memory"