Replace xml value based on csv columns

Question

I am trying to replace class name in xml file based on csv columns. Actually xml files are annotation files.

This is the format of xml:

<annotation>
<folder>./test_xmls</folder>
<filename>000048_Panorama.jpg</filename>
<path>./images000048_Panorama.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>4000</width>
<height>2000</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>AAAA</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>

My csv contains original column and change to` column.

format is:

|original | change to|
-------------------------- 
| AAAA  | class_A |

..................

Csv has more than 20000 rows which includes all the <name>AAAA</name> of 80000 xml files.

I want to match xml name like AAAA with csv column. If it exists in original column then I want to replace by corresponding value from change to like AAAA to class_A.

I tried to write python code but it doesn't work. My code is here

import xml.etree.ElementTree as ET
import os
import pandas as pd
from collections import defaultdict
import csv
from csv import reader


with open('table.csv', mode='r') as inp:
    reader = csv.reader(inp)
    dict_from_csv = {rows[0]:rows[2] for rows in reader}

#print(dict_from_csv)

root_path = "./xmls"

xml_list = sorted(os.listdir(root_path))

for xml_file in xml_list:
    xml_path = os.path.join(root_path,xml_file)
    # parse xml file
    tree = ET.parse(xml_path)
    # get root node
    root = tree.getroot()
    for member in root.findall('object'):
        sub_child = member[0].text
        print(sub_child)
    for key, value in dict_from_csv.items():
        if sub_child in key:
            sub_child = sub_child.replace(sub_child, value)
            #print(xml)
        xml_file.write(sub_child)  
        print("Classes are changed : " + xml_path)

Any help would be appreciated.

Thank you

Kafka4PresidentNow · Accepted Answer · 2021-07-15 15:17:29Z

The following code should do what you want:

import lxml.html   # check https://pypi.org/project/lxml/
from csv import reader
from os.path import exists
import glob


def update_xml(path: str) -> None:
    with open('./convertions.csv', 'r') as convertions, open(path, 'r') as annotation:  # noqa: E501
        tree = lxml.html.fromstring(annotation.read())
        csv_reader = reader(convertions)

        for idx, row in enumerate(csv_reader, start=1):
            if idx == 1:
                continue

            original, change_to = row

            tags = tree.xpath(f".//name[text()='{original}']")

            for tag in tags:
                tag.text = change_to

                print(f'Changed class {original} to {change_to} in {path}')

    with open(path, 'wb') as annotation:
        new_content = lxml.html.tostring(tree)

        if new_content.strip():
            annotation.write(new_content)

    print(f'Processing on {path} done')


if __name__ == '__main__':
    for xml_file in glob.glob('*.xml'):
        if exists(xml_file):
            update_xml(path=xml_file)

annotation.xml:

<annotation>
<folder>./test_xmls</folder>
<filename>000048_Panorama.jpg</filename>
<path>./images000048_Panorama.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>4000</width>
<height>2000</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>AAAA</name>
<name>BBBB</name>
<name>CCCC</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox></bndbox></object></annotation>

convertions.csv:

original,change to
AAAA,class_A
BBBB,class_B
CCCC,class_C

hi @Caiolgnm thanks for your effort. I am not willing to make a new xml file. I already have more multiple xml fies. I just want to update them based on csv. I tried to modify your code but it still throws error like to many values to unpack. can you help me? — user12321371, Commented Jul 15, 2021 at 2:46
The code here does not create a new xml file, it just update the xml file passed in place. The code opens the CSV file,checks each line and use its values to update the xml file. Is it not what you want? — Kafka4PresidentNow, Commented Jul 15, 2021 at 12:50
Thank you for your time. Yes I want to update same xml. In your answer you open only one xml I guess. I was trying to open multiple xml through glob but didn't go well.I have to update more than 70k xml files. — user12321371, Commented Jul 15, 2021 at 13:58
I've edited the code to meet your criteria. Please upvote and accept answer if it is useful to you — Kafka4PresidentNow, Commented Jul 15, 2021 at 15:18

user12321371user12321371 · Accepted Answer · 2021-07-15 02:48:43Z

import lxml.html  # check https://pypi.org/project/lxml/
from csv import reader
import xml.etree.ElementTree as ET

if __name__ == '__main__':
    with open('./table.csv', 'r') as convertions:
        csv_reader = reader(convertions)
        root_path = "./xmls"

        xml_list = sorted(os.listdir(root_path))

        for xml_file in xml_list:
            xml_path = os.path.join(root_path,xml_file)
            #tree = lxml.html.fromstring(xml_path.read())
            # parse xml file
            tree = ET.parse(xml_path)

        for idx, row in enumerate(csv_reader, start=1):
            if idx == 1:
                continue

            original_col, change_to = row

            tags = tree.xpath(f".//name[text()='{original_col}']")

            for tag in tags:
                tag.text = change_to

                print(f'Changed class {original_col} to {change_to}')


            new_content = lxml.html.tostring(tree)
            print(new_content)

        if new_content.strip():
            tree.write(new_content)

Collectives™ on Stack Overflow

Replace xml value based on csv columns

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related