1

I'm trying to create a simple XML parser where each different XML schema has it's own parser class but I can't figure out what the best way is. What I in effect would like to do is something like this:

in = sys.stdin
xmldoc = minidom.parse(in).documentElement

xmlParser = xmldoc.nodeName
parser = xmlParser()
out = parser.parse(xmldoc)

I'm not also quite sure if I get the document root name correctly, but that's the idea: create an object of a class with similar name to the document root and use the parse() function in that class to parse and handle the input.

What would be the simplest way to achieve this? I've been reading about introspection and templates but haven't been able to figure this out yet. I've done a similar thing with Java in the past and AFAIK, Ruby also makes this simple. What's the pythonian way?

2
  • 1
    This question is worthless without details. By "parsing", do you mean "extracting data from DOM"? Or do you want to build an entire XML parser from scratch? Or do you mean a validator? ...? Commented Sep 1, 2010 at 13:51
  • How would that be relevant? I want to be able to call a Python class based on the document root name of an XML file. I don't think it's relevant what exactly I'm going to do in those classes. Commented Sep 1, 2010 at 16:39

2 Answers 2

1

As pointed out by Mark in his comment, to get a reference to a class that you know the name of at runtime, you use getattr.

doc = minidom.parse(sys.stdin)
# is equivalent to
doc = getattr(minidom, "parse")(sys.stdin)

Below is a corrected version of your pseudo-code.

from xml.dom import minidom
import sys
import myParsers # a module containing your parsers

xmldoc = minidom.parse(sys.stdin).documentElement

myParserName = xmldoc.nodeName
myParserClass = getattr(myParsers, myParserName)
# create an instance of myParserClass by calling it with the documentElement
parser = myParserClass(xmldoc)
# do whatever you want with the instance of your parser class
output = parser.generateOutput()

getattr will return an AttributeError if the attribute doesn't exist, so you can wrap the call in a try...except or pass a third argument to getattr, wich will be returned if the attribute isn't found.

Sign up to request clarification or add additional context in comments.

Comments

1

I think most python programmers would just use lxml to parse their xml. If you still want to wrap that in classes you could, but as delnan said in his comment, it's a bit unclear what you really mean.

from lxml import etree

tree = etree.parse('my_doc.xml')
for element in tree.getroot():
    ...

A couple of side notes, if other programmers are going to be reading your code, you should try to at least roughly follow PEP 8. More importantly though, you really shouldn't assign to builtins like "in."

2 Comments

This is just a simple test server where this script receives an XML file and returns something. I thought I'd make it a bit more clever so that it's easy to add more tests to the xml received (validity checking etc) per schema (i.e. I could just check that the xml file is correct). My plan was to have the parsers named after the root document, but this is beyond the point as I was more interested in the reflection/introspection part of my question. I.e. is it possible to create an object if we have the object's name as a string?
Well, it's simple to instantiate an existing class if you know it's name. You can just use parser_class = getattr(module, class_name). I think this is what you are asking for. If you want to dynamically generate a class based on a string name, you can actually do that to, but I don't think that's what you want.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.