xml - Python - Process Disappearing and Freezing when working with large Dict? -


i'm in process of dealing xml files 85mb in size. i'm trying process one. i'm doing download zip, extract disk, convert xml python dict, change couple things, save dict, , send mongodb. except when converting python dict, process freezes/disappears..

i'm running script on vm ubuntu 13.04 server, 4 cores @ 2.6, 16gb of ram, , 1tb 15,000rpm. i'm monitoring script runs, python takes 12% of ram on 7 minutes, , gone, process falls off high usage list , pipe terminal doesn't move. kill ctrl+z , returned "write failed: broken pipe".

the last thing printed on terminal "converting dailyprice_0505_eur.xml.zip", makes me suspect maybe xmltodict, i'm stuck. example code, data, should work willing me test out. appreciated! thanks.

#importing import urllib, xmltodict, os zipfile import zipfile  #getting working dir abspath = os.path.abspath(__file__) root = os.path.dirname(abspath) + "/" print "current working directory: " + root  #defining urlauth = 'https://dl.dropboxusercontent.com/u/9235267/' dailypricefl = ['dailyprice_0505_eur.xml.zip'] dailypricedict = {}  x in dailypricefl:     print '  * downloading',x     urllib.urlretrieve(urlauth+x, x)      print '  * extracting',x     zipfile(x, "r") z:         z.extractall(root)      print '  * converting',x     f = open(root+x.replace(".zip",""))     data = xmltodict.parse(f.read())     f.close()      print '  * adding currency dict',x     y in data['prices']['price']:         y.update({"currency": x[-7:].replace(".xml","").upper()})      print '  * ammending',x     dailypricedict.update(data)      print '  * deleting',x     os.remove(root+x)     os.remove(root+x.replace(".zip",""))     print '  * finished',x 

i wonder if it's not memory error despite specs listed. when attempt load xml in xml.dom.minidom this:

traceback (most recent call last):   file "c:\python27\lib\xml\dom\minidom.py", line 1930, in parsestring     return expatbuilder.parsestring(string)   file "c:\python27\lib\xml\dom\expatbuilder.py", line 940, in parsestring     return builder.parsestring(string)   file "c:\python27\lib\xml\dom\expatbuilder.py", line 223, in parsestring     parser.parse(string, true)   file "c:\python27\lib\xml\dom\expatbuilder.py", line 751, in start_element_handler     node = minidom.element(qname, uri, prefix, localname)   file "c:\python27\lib\xml\dom\minidom.py", line 653, in __init__     self._attrs = {}   # attributes double-indexed: memoryerror 

this seem work on side of things though:

>>> import xmltodict, os >>> data = open('price.xml').read() >>> xml = xmltodict.parse(data) >>> xml['prices']['price'][0] ordereddict([(u'code', u'ad1550.301.1'),               (u'startdate', u'2013-08-24'),               (u'enddate', u'2013-09-30'),               (u'rentalprice', u'126.00'),               (u'midweekrentalprice', u'0.00'),               (u'weekendrentalprice', u'0.00'),               (u'fixprice', u'0.00')]) 

that take 800mb on system, don't seem errors. however, if try parse xmltodict after i've attempted use minidom, more this:

traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "c:\python27\lib\site-packages\xmltodict.py", line 228, in parse     parser.parse(xml_input, true) xml.parsers.expat.expaterror: out of memory: line 1, column 0 

which suggests rather directly that, because both based on expat, not getting freed between runs of script (or iterations of loop), , you'll need refactor somehow.


Comments

Popular posts from this blog

design - Custom Styling Qt Quick Controls -

Unable to remove the www from url on https using .htaccess -