Skip to main content

All Questions

1 vote
1 answer
286 views

Encoding Issue When Attempting to Convert Hindi Script PDF to CSV in Python

I'm currently attempting to convert a PDF file containing Hindi Devanagari script to a CSV file using the fitz library in Python, but when I read in the text I encounter a strange encoding issue. Here ...
cedratcarlisle's user avatar
1 vote
0 answers
80 views

PDF scraping, tabula py - columns do not correspond with "true" values of PDF file

I get stuck again with PDF scraping and observe that columns do not correspond to some of the values that I obtain for those columns. Basically, I want to obtain a CSV file, but first I want to ...
Michael Picazo's user avatar
0 votes
0 answers
244 views

Python Tabula: Reading in PDF to Python as Pandas Dataframe

Scraping PDF data from a website, they changed their PDF formatting so I can no longer use my solution that worked for every other PDF. Unsure of an alternative method. Hello everyone, I am trying to ...
jare2620's user avatar
1 vote
0 answers
172 views

Tabula-py: reading tables from a pdf that contains form fields

I'm trying to read a pdf that contains multiple tables that have form fields for ticks/checkmarks free text, numbers, dropdown selections etc. Unfortunately the dataframes that are returned don't ...
gokepler's user avatar