xml - Python - Process Disappearing and Freezing when working with large Dict? -
i'm in process of dealing xml files 85mb in size. i'm trying process one. i'm doing download zip, extract disk, convert xml python dict, change couple things, save dict, , send mongodb. except when converting python dict, process freezes/disappears..
i'm running script on vm ubuntu 13.04 server, 4 cores @ 2.6, 16gb of ram, , 1tb 15,000rpm. i'm monitoring script runs, python takes 12% of ram on 7 minutes, , gone, process falls off high usage list , pipe terminal doesn't move. kill ctrl+z , returned "write failed: broken pipe".
the last thing printed on terminal "converting dailyprice_0505_eur.xml.zip", makes me suspect maybe xmltodict, i'm stuck. example code, data, should work willing me test out. appreciated! thanks.
#importing import urllib, xmltodict, os zipfile import zipfile #getting working dir abspath = os.path.abspath(__file__) root = os.path.dirname(abspath) + "/" print "current working directory: " + root #defining urlauth = 'https://dl.dropboxusercontent.com/u/9235267/' dailypricefl = ['dailyprice_0505_eur.xml.zip'] dailypricedict = {} x in dailypricefl: print ' * downloading',x urllib.urlretrieve(urlauth+x, x) print ' * extracting',x zipfile(x, "r") z: z.extractall(root) print ' * converting',x f = open(root+x.replace(".zip","")) data = xmltodict.parse(f.read()) f.close() print ' * adding currency dict',x y in data['prices']['price']: y.update({"currency": x[-7:].replace(".xml","").upper()}) print ' * ammending',x dailypricedict.update(data) print ' * deleting',x os.remove(root+x) os.remove(root+x.replace(".zip","")) print ' * finished',x
i wonder if it's not memory error despite specs listed. when attempt load xml in xml.dom.minidom
this:
traceback (most recent call last): file "c:\python27\lib\xml\dom\minidom.py", line 1930, in parsestring return expatbuilder.parsestring(string) file "c:\python27\lib\xml\dom\expatbuilder.py", line 940, in parsestring return builder.parsestring(string) file "c:\python27\lib\xml\dom\expatbuilder.py", line 223, in parsestring parser.parse(string, true) file "c:\python27\lib\xml\dom\expatbuilder.py", line 751, in start_element_handler node = minidom.element(qname, uri, prefix, localname) file "c:\python27\lib\xml\dom\minidom.py", line 653, in __init__ self._attrs = {} # attributes double-indexed: memoryerror
this seem work on side of things though:
>>> import xmltodict, os >>> data = open('price.xml').read() >>> xml = xmltodict.parse(data) >>> xml['prices']['price'][0] ordereddict([(u'code', u'ad1550.301.1'), (u'startdate', u'2013-08-24'), (u'enddate', u'2013-09-30'), (u'rentalprice', u'126.00'), (u'midweekrentalprice', u'0.00'), (u'weekendrentalprice', u'0.00'), (u'fixprice', u'0.00')])
that take 800mb on system, don't seem errors. however, if try parse xmltodict after i've attempted use minidom, more this:
traceback (most recent call last): file "<stdin>", line 1, in <module> file "c:\python27\lib\site-packages\xmltodict.py", line 228, in parse parser.parse(xml_input, true) xml.parsers.expat.expaterror: out of memory: line 1, column 0
which suggests rather directly that, because both based on expat
, not getting freed between runs of script (or iterations of loop), , you'll need refactor somehow.
Comments
Post a Comment