All Questions
4 questions
1
vote
1
answer
286
views
Encoding Issue When Attempting to Convert Hindi Script PDF to CSV in Python
I'm currently attempting to convert a PDF file containing Hindi Devanagari script to a CSV file using the fitz library in Python, but when I read in the text I encounter a strange encoding issue.
Here ...
1
vote
0
answers
80
views
PDF scraping, tabula py - columns do not correspond with "true" values of PDF file
I get stuck again with PDF scraping and observe that columns do not correspond to some of the values that I obtain for those columns. Basically, I want to obtain a CSV file, but first I want to ...
0
votes
0
answers
244
views
Python Tabula: Reading in PDF to Python as Pandas Dataframe
Scraping PDF data from a website, they changed their PDF formatting so I can no longer use my solution that worked for every other PDF. Unsure of an alternative method.
Hello everyone,
I am trying to ...
1
vote
0
answers
172
views
Tabula-py: reading tables from a pdf that contains form fields
I'm trying to read a pdf that contains multiple tables that have form fields for ticks/checkmarks free text, numbers, dropdown selections etc.
Unfortunately the dataframes that are returned don't ...