1

I have been trying to open an excel file (xlsx format and csv format) using python pandas and I am facing utf-8 encoding errors. I have also tried the encoding codes but could not solve the issue.

Kindly support me to understand and solve the issue

this is the code :

import pandas as pd
excel_file = 'Task1/Data_task1.xlsx'
data =  pd.read_excel(excel_file, encoding='utf-8', errors = 'ignore')
print(data)

Error :

File "c:\Users\nivas\Desktop\Srinivas\Internship\Dealroomo\Task1\task1.py", line 4, in <module>
    print(data)
  File "C:\Users\nivas\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 3140-3145: character maps to <undefined>
8
  • Does this answer your question? UnicodeEncodeError: 'charmap' codec can't encode characters Commented Dec 30, 2019 at 12:27
  • 1
    You need to pass the correct value for encoding. Since it is an Excel file, maybe encoding='iso8859-1' can help. Commented Dec 30, 2019 at 12:32
  • 1
    cp1252 and cp1251 are common as well. You need to figure out what encoding is used on your Excel file. Take a look here. Commented Dec 30, 2019 at 12:36
  • 1
    error shows that problem has print(), not read_excel() so problem is Windows terminal/console/cmd.exe which uses cp1250 as default encoding - so print() tries to convert displayed data to cp1250. Some people change default encoding in Windows registers. Search encoding windows register encoding 65000 Commented Dec 30, 2019 at 12:51
  • 1
    Change default code page of Windows console to UTF-8 Commented Dec 30, 2019 at 12:54

1 Answer 1

0

From my experience, Excel Text and Python do not play well together, and the many times the encoding just never works; do not know why or how.

2 Possible solutions:

  1. Convert the file to CSV (.txt/.csv format) and see if you can encode it manually inside Excel.

  2. Run the program on Linux Ubuntu using LibreOffice instead of Excel. Again, you will need to convert to .csv. However, LibreOffice seems to handle the encoding MUCH better than Excel. For whatever reason, Excel can refuse to convert and get rid of all the funky characters that raise Unicode Error in Python.

Best of luck

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.