0

I have an XSD file of the following format:

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:type name="type1">
        <xsd:example>
          <xsd:description>This is the description of said type1 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type2">
        <xsd:example>
          <xsd:description>This is the description of said type2 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type3">
        <xsd:example>
          <xsd:description>This is the description of said type3 tag</xsd:description>
        </xsd:example>
    </xsd:type>
</xsd:schema>

and the following XML file:

<theRoot>
    <type1>hi from type1</type1>
    <theChild>
        <type2>hi from type2</type2>
        <type3>hi from type3</type3>
    </theChild>
</theRoot>

I'd like to retrieve the value in between the xsd:description tag given that it is the child of the xsd:type tag with the name="type1" attribute. In other words, I'd like to retrieve "This is the description of said type1 tag".

I have tried to do this with lxml in the following way using Python:

from lxml import etree
XSDDoc = etree.parse(xsdFile)
root = XSDDoc.getroot()
result = root.findall(".//xsd:type/xsd:example/xsd:description[@name='type1']", root.nsmap)

I've used the same example and solution mentioned here. However, what I have done just returns empty results and I'm not able to retrieve the correct result.

For reference, my Python version is: Python 2.7.10

EDIT: When I use an example provided in the answer by retrieving the XML structure from a string, the result is as expected. However, when I try to retrieve from a file, I get empty lists returned (or None).

I am doing the following:

  • Retrieving the XML from a file
  • Including a variable to denote the name attribute (as it is dynamic)

The code loops over each node in a separate XML file, then checks in the XSD file to get each of the attributes as a result:

XMLDoc = etree.parse(open(xmlFile))

for Node in XMLDoc.xpath('//*'):
    nameVariable = os.path.basename(XMLDoc.getpath(Node))
    root = XSDDoc.getroot()
    description = XSDDoc.find(".//xsd:type[@name='{0}']/xsd:example/xsd:description".format(nameVariable), root.nsmap)

If I try to print out the result.text, I get:

AttributeError: 'NoneType' object has no attribute 'text'

7
  • What exactly have you tried? In the code in the question, you don't attempt to get the xsd:description element (which is the grandchild of xsd:type). Commented Nov 19, 2019 at 12:19
  • @mzjn sorry, as I've had to remove some sensitive information, I've left out the remaining path following xsd:type. I have edited the question to reflect my exact code. Commented Nov 19, 2019 at 12:25
  • That is not really the "exact" code (what is nameVariable?) Please provide a minimal reproducible example. Commented Nov 19, 2019 at 14:52
  • I have edited my question. nameVariable is simply a string. Commented Nov 19, 2019 at 14:55
  • Sorry to nag about this, but when I ask for a minimal reproducible example, I mean complete but minimal code (and XML) that I can copy, paste and run without changing anything. Commented Nov 19, 2019 at 14:59

1 Answer 1

1

The predicate ([@name='type1']) must be applied in the right place. The name attribute is on the xsd:type element. This should work:

result = root.findall(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

# result is a list
for r in result:
    print(r.text)

In case you only want a single node, you can use find instead of findall. Complete example:

from lxml import etree

xsdFile = """
<root xmlns:xsd='http://whatever.com'>
 <xsd:type name="type1">
     <xsd:example>
       <xsd:description>This is the description of said type1 tag</xsd:description>
     </xsd:example>
 </xsd:type>
</root>"""

root = etree.fromstring(xsdFile)
result = root.find(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

print(result.text)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your answer. However, that piece of code returns an empty list, rather than anything containing the value within the tag.
Also, I believe the code would return a list object. How can I extract the value attribute from that list object?
Thank you for your help again. I have edited my question based off your answer. Please have a look when you can.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.