0

I am trying to read an xls file from a url into python dataframe. However I am getting below assertion error.

Traceback (most recent call last): File "c:\Sample_project\venv\excel_read_v1.py", line 17, in df = pd.read_excel("file.xls", File "C:\Sample_project\venv\lib\site-packages\pandas\util_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "C:\Sample_project\venv\lib\site-packages\pandas\io\excel_base.py", line 457, in read_excel io = ExcelFile(io, storage_options=storage_options, engine=engine) File "C:\Sample_project\venv\lib\site-packages\pandas\io\excel_base.py", line 1419, in init self._reader = self._engines[engine](self.io, storage_options=storage_options) File "C:\Sample_project\venv\lib\site-packages\pandas\io\excel_xlrd.py", line 25, in init super().init(filepath_or_buffer, storage_options=storage_options) File "C:\Sample_project\venv\lib\site-packages\pandas\io\excel_base.py", line 518, in init self.book = self.load_workbook(self.handles.handle) File "C:\Sample_project\venv\lib\site-packages\pandas\io\excel_xlrd.py", line 38, in load_workbook return open_workbook(file_contents=data) File "C:\Sample_project\venv\lib\site-packages\xlrd_init.py", line 172, in open_workbook bk = open_workbook_xls( File "C:\Sample_project\venv\lib\site-packages\xlrd\book.py", line 104, in open_workbook_xls bk.parse_globals() File "C:\Sample_project\venv\lib\site-packages\xlrd\book.py", line 1211, in parse_globals self.handle_sst(data) File "C:\Sample_project\venv\lib\site-packages\xlrd\book.py", line 1178, in handle_sst self._sharedstrings, rt_runlist = unpack_SST_table(strlist, uniquestrings) File "C:\Sample_project\venv\lib\site-packages\xlrd\book.py", line 1472, in unpack_SST_table assert _unused_i == nstrings - 1 AssertionError

I read some other suggestions on stackoverflow that if I remove the last few empty lines from the excel then it would work. So I tried that out by downloading the file in a local folder , removing the last 2 empty rows and then reading the file from the local folder, this works. But i need the code to somehow able to handle it while reading from the url so that we can automate the process

I have tried using openpyxl and xlrd to read the file.

---Code snapshot below--------

import openpyxl
import xlrd 
from xlrd import open_workbook
import requests
import pandas as pd
url = url

r = requests.get(url)
with open('maskefile.xls', 'wb') as output:
    output.write(r.content)







df = pd.read_excel("maskedfile.xls",sheet_name = "maskedsheetname")            

df.to_csv("C:\Sample_project\maskedfile.csv" ,index = False)
3
  • Please provide enough code so others can better understand or reproduce the problem.
    – Community Bot
    Commented Jun 24, 2022 at 20:10
  • masked code added Commented Jun 27, 2022 at 13:24
  • any suggestions ? one workaround that i have got is also failingi..e have been asked to check if i can convert the xls file to xlsx but even for that I need to read the xls file . All packages that I have tried lead to some or the other failure Commented Jul 1, 2022 at 11:29

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.