2

I am looking to turn this XNL file countries into a table in a CSV. However, I am having trouble parsing it/extracting the data from it. I am trying to turn it into a table with 5 columns which are

 ['CtryNm', 'CcyNm', 'Ccy', 'CcyNbr', 'CcyMnrUnts']

This is a snippet of how the xml file is structured

<ISO_4217 Pblshd="2024-01-01">
    <CcyTbl>
        <CcyNtry>
            <CtryNm>AFGHANISTAN</CtryNm>
            <CcyNm>Afghani</CcyNm>
            <Ccy>AFN</Ccy>
            <CcyNbr>971</CcyNbr>
            <CcyMnrUnts>2</CcyMnrUnts>
        </CcyNtry>
        <CcyNtry>
            <CtryNm>ZZ07_No_Currency</CtryNm>
            <CcyNm>The codes assigned for transactions where no currency is involved</CcyNm>
            <Ccy>XXX</Ccy>
            <CcyNbr>999</CcyNbr>
            <CcyMnrUnts>N.A.</CcyMnrUnts>
        </CcyNtry>
    </CcyTbl>
</ISO_4217>

I tried doing

def parse_xml(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    return root

def extract_data(root):
    data = []
    for record in root.findall('CcyNtry'):
        row = {}
        for field in record:
            row[field.tag] = field.text
        data.append(row)
    return data

def main(xml_file, csv_file):
    root = parse_xml(xml_file)
    data = extract_data(root)
    df = to_dataframe(data)
    to_csv(df, csv_file)

main(countries, 'output.csv')

However this just seems to return an empty file. Anyone know what I'm doing wrong here? Or is there just a simple way to turn the data from this XML into a dataframe?

4 Answers 4

0

Version with beautifulsoup:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list-one.xml"

soup = BeautifulSoup(requests.get(url).content, "xml")

data = []
for c in soup.select("CcyNtry"):
    data.append({t.name: t.text for t in c.find_all()})

df = pd.DataFrame(data)
print(df.head(10))

# to save to CSV use:
# df.to_csv('data.csv', index=False)

Prints:

                CtryNm                  CcyNm  Ccy CcyNbr CcyMnrUnts
0          AFGHANISTAN                Afghani  AFN    971          2
1        ÅLAND ISLANDS                   Euro  EUR    978          2
2              ALBANIA                    Lek  ALL    008          2
3              ALGERIA         Algerian Dinar  DZD    012          2
4       AMERICAN SAMOA              US Dollar  USD    840          2
5              ANDORRA                   Euro  EUR    978          2
6               ANGOLA                 Kwanza  AOA    973          2
7             ANGUILLA  East Caribbean Dollar  XCD    951          2
8           ANTARCTICA  No universal currency  NaN    NaN        NaN
9  ANTIGUA AND BARBUDA  East Caribbean Dollar  XCD    951          2

OR:

df = pd.read_xml(url, xpath=".//CcyNtry")
print(df)

Prints:

                                                         CtryNm                                                              CcyNm   Ccy  CcyNbr CcyMnrUnts
0                                                   AFGHANISTAN                                                            Afghani   AFN   971.0          2
1                                                 ÅLAND ISLANDS                                                               Euro   EUR   978.0          2
2                                                       ALBANIA                                                                Lek   ALL     8.0          2
3                                                       ALGERIA                                                     Algerian Dinar   DZD    12.0          2
4                                                AMERICAN SAMOA                                                          US Dollar   USD   840.0          2
5                                                       ANDORRA                                                               Euro   EUR   978.0          2

...
0

Without any external lib

import csv
import xml.etree.ElementTree as ET

xml = '''<ISO_4217 Pblshd="2024-01-01">
    <CcyTbl>
        <CcyNtry>
            <CtryNm>AFGHANISTAN</CtryNm>
            <CcyNm>Afghani</CcyNm>
            <Ccy>AFN</Ccy>
            <CcyNbr>971</CcyNbr>
            <CcyMnrUnts>2</CcyMnrUnts>
        </CcyNtry>
        <CcyNtry>
            <CtryNm>ZZ07_No_Currency</CtryNm>
            <CcyNm>The codes assigned for transactions where no currency is involved</CcyNm>
            <Ccy>XXX</Ccy>
            <CcyNbr>999</CcyNbr>
            <CcyMnrUnts>N.A.</CcyMnrUnts>
        </CcyNtry>
    </CcyTbl>
</ISO_4217>'''

headers = None
data = []
root = ET.fromstring(xml)
for entry in root.findall('.//CcyNtry'):
    if not headers:
        headers = [e.tag for e in list(entry)]
    data.append([e.text for e in list(entry)])

with open("my_file.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(data)
0

You were really close, you just need to get the intermediate CcyTbl node and find the CcyNtry nodes beneath that:

def extract_data(root):
    table = root.find("CcyTbl")

    data = []
    for record in table.findall("CcyNtry"):
        ...

Or, like balderman's solution shows, use some pseudo-XPath to look anywhere down the tree for the CcyNtry nodes, root.findall('.//CcyNtry').

Now, data looks like:

[
    {
        "CtryNm": "AFGHANISTAN",
        "CcyNm": "Afghani",
        "Ccy": "AFN",
        "CcyNbr": "971",
        "CcyMnrUnts": "2",
    },
    {
        "CtryNm": "ZZ07_No_Currency",
        "CcyNm": "The codes assigned for transactions where no currency is involved",
        "Ccy": "XXX",
        "CcyNbr": "999",
        "CcyMnrUnts": "N.A.",
    },
]

From that, you could just pass it to Python's csv.DictWriter. You have to tell the DictWriter which keys to expect/use, so just pass in the first row of data:

with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=data[0])
    writer.writeheader()
    writer.writerows(data)
| CtryNm           | CcyNm                                                             | Ccy | CcyNbr | CcyMnrUnts |
|------------------|-------------------------------------------------------------------|-----|--------|------------|
| AFGHANISTAN      | Afghani                                                           | AFN | 971    | 2          |
| ZZ07_No_Currency | The codes assigned for transactions where no currency is involved | XXX | 999    | N.A.       |

You can also tell the DictWriter to use specific fields, and ignore any keys it finds that don't match with the extrasaction keyword:

writer = csv.DictWriter(f, fieldnames=["CtryNm", "CcyNm"], extrasaction="ignore")
 CtryNm           | CcyNm                                                             |
|------------------|-------------------------------------------------------------------|
| AFGHANISTAN      | Afghani                                                           |
| ZZ07_No_Currency | The codes assigned for transactions where no currency is involved |
0

You can use pandas read_xml() and to_csv():

import pandas as pd

url = "https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list-one.xml"

df = pd.read_xml(url, xpath=".//CcyNtry")
df.to_csv("output.csv")

Output File like:

,CtryNm,CcyNm,Ccy,CcyNbr,CcyMnrUnts
0,AFGHANISTAN,Afghani,AFN,971.0,2
1,ÅLAND ISLANDS,Euro,EUR,978.0,2
2,ALBANIA,Lek,ALL,8.0,2
….

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.