parsing an xml file for unknown elements using python ElementTree

Question

I wish to extract all the tag names and their corresponding data from a multi-purpose xml file. Then save that information into a python dictionary (e.g tag = key, data = value). The catch being the tags names and values are unknown and of unknown quantity.

    <some_root_name>
        <tag_x>bubbles</tag_x>
        <tag_y>car</tag_y>
        <tag...>42</tag...>
    </some_root_name>

I'm using ElementTree and can successfully extract the root tag and can extract values by referencing the tag names, but haven't been able to find a way to simply iterate over the tags and data without referencing a tag name.

Any help would be great.

Thank you.

Kristofer · Accepted Answer · 2012-01-11 11:35:21Z

7

from lxml import etree as ET

xmlString = """
    <some_root_name>
        <tag_x>bubbles</tag_x>
        <tag_y>car</tag_y>
        <tag...>42</tag...>
    </some_root_name> """

document = ET.fromstring(xmlString)
for elementtag in document.getiterator():
   print "elementtag name:", elementtag.tag

EDIT: To read from file instead of from string

document = ET.parse("myxmlfile.xml")

edited Jan 11, 2012 at 11:35

answered Jan 11, 2012 at 10:16

Kristofer

3,2991 gold badge25 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Markus Over a year ago

Thanks for the reply, that should work well. I am using .xml files (not an xml string). Do I need to convert the file to a string before I can iterate through it? If so, could you please tell me how to do it? StringIO? Thanks again.

Loïc G. Over a year ago

from xml.etree should be from lxml.etree, no ?

John Machin Over a year ago

or use import xml.etree.ElementTree as ET ... unlike lxml, this and its faster C-coded sibling cElementTree comes bundled with Python.

John Machin · Accepted Answer · 2012-01-11 12:06:08Z

2

>>> import xml.etree.cElementTree as et
>>> xml = """
...    <some_root_name>
...         <tag_x>bubbles</tag_x>
...         <tag_y>car</tag_y>
...         <tag...>42</tag...>
...     </some_root_name>
... """
>>> doc = et.fromstring(xml)
>>> print dict((el.tag, el.text) for el in doc)
{'tag_x': 'bubbles', 'tag_y': 'car', 'tag...': '42'}

If you really want 42 instead of '42', you'll need to work a little harder and less elegantly.

answered Jan 11, 2012 at 12:06

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

Comments

unutbu · Accepted Answer · 2012-01-11 12:46:47Z

You could use xml.sax.handler to parse the XML:

import xml.sax as sax
import xml.sax.handler as saxhandler
import pprint

class TagParser(saxhandler.ContentHandler):
    # http://docs.python.org/library/xml.sax.handler.html#contenthandler-objects
    def __init__(self):
        self.tags = {}
    def startElement(self, name, attrs):
        self.tag = name
    def endElement(self, name):
        if self.tag:
            self.tags[self.tag] = self.data
            self.tag = None
            self.data = None
    def characters(self, content):
        self.data = content

parser = TagParser()
src = '''\
<some_root_name>
    <tag_x>bubbles</tag_x>
    <tag_y>car</tag_y>
    <tag...>42</tag...>
</some_root_name>'''
sax.parseString(src, parser)
pprint.pprint(parser.tags)

yields

{u'tag...': u'42', u'tag_x': u'bubbles', u'tag_y': u'car'}

Thanks for the reply, I'm not familiar with xml.sax. Is it possible to get an output that is more like {'tag_x:bubbles','tag_y:car','tag...:42'}?
@Markus: Of course it is. unutbu didn't read your question properly. You should be able to initialise self.tags as a dict and change the self.tags.append line to what you want.
@JohnMachin Ok, that's pretty straight forward. Thanks for all your answers John.

martineau · Accepted Answer · 2013-11-24 03:00:57Z

0

This could be done using lxml in python

from lxml import etree

myxml = """
          <root>
             value
          </root> """

doc = etree.XML(myxml)

d = {}
for element in doc.iter():
      key = element.tag
      value = element.text
      d[key] = value

print d

edited Nov 24, 2013 at 3:00

martineau

124k29 gold badges181 silver badges319 bronze badges

answered Jan 11, 2012 at 10:10

Nava

6,6069 gold badges47 silver badges69 bronze badges

2 Comments

Markus Over a year ago

Another great answer and it looks a bit more compact, thank you. The same question I asked Kristofer, do I need to convert the XML file I'm trying to read into a xml string before using iter? Is that easy to do?

John Machin Over a year ago

-1 It's NOT a great answer. Instead of d= {key:value}, it should have d[key] = value.

Collectives™ on Stack Overflow

parsing an xml file for unknown elements using python ElementTree

4 Answers 4

3 Comments

Comments

3 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

3 Comments

2 Comments

Linked

Related