Reading an irregular column CSV file using Pandas causes errors

Question

I have a csv file,

"==CNAME=="
""
"Tool Name","v2.1"
"Name:","MATT B"
"E-Mail:","[email protected]"
"Phone Number:","987654321"
""

while running the following script,

import pandas as pd
import os

def convert_csv_to_xlsx(csv_file_path):
    csv_file = r"C:\Users\aztec\Desktop\del.csv"
    df_csv = pd.read_csv(csv_file)
    excel_file = os.path.splitext(csv_file)[0] + '.xlsx'
    df_csv.to_excel(excel_file, index=False)
    return excel_file

I get the following error,

Error reading the CSV file, Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

I tried specifying delimiters, Same error persists. I want to know why this is happening and how to resolve this

the csv file is not regular (variable number of fields), pandas doesn't seem to support that. I would use csv module and openpyxl to convert, not pandas — Jean-François Fabre, Commented Nov 30, 2024 at 22:31
Your file is not structured in the regular way a CSV file is - you have rows of key, value, instead of rows with columns with values for a specific dimension (where the first line is the header line). The error says that the number of expected columns were more than expected (since the first line only has one column, that's what's expected for the rest of the file as well). — MatsLindh, Commented Nov 30, 2024 at 22:31

Charles Duffy · Accepted Answer · 2024-12-01 03:13:35Z

Pandas is designed to work with data in a consistent format, where a CSV file contains a table structured as rows (one per record) and columns (one per data element within each record). Your CSV file doesn't qualify -- it has ragged edges and an irregular column length, and no consistent relationship between column and meaning -- so it makes sense to avoid pandas and use the standard-library csv module, while directly using an XLSX library of your choice (if you choose, you might pick the same one pandas uses -- pandas doesn't directly include Excel support either and just uses a library dependency for the purpose itself).

import csv, os.path, sys
from openpyxl import Workbook

def convert_csv_to_xlsx(csv_file_path):
    excel_filename = os.path.splitext(csv_file_path)[0] + '.xlsx'
    print(f'Writing to {excel_filename!r}', file=sys.stderr)

    with open(csv_file_path, 'r') as csv_file:
        csv_reader = csv.reader(csv_file)

        wb = Workbook()
        ws = wb.active
        for row in csv_reader:
            ws.append(row)
        wb.save(excel_filename)

convert_csv_to_xlsx('del.csv')

Collectives™ on Stack Overflow

Reading an irregular column CSV file using Pandas causes errors

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related