Skip to main content

All Questions

Tagged with
0 votes
1 answer
136 views

PDF Scraping - All Objects Passed were None

I am attempting to create a simple pdf scraper using pandas and pdfquery. I want to take the data I need from each page of the PDF by using the xml coordinates, put it into a dataframe and then save ...
Andrew Martin's user avatar
0 votes
0 answers
49 views

Error in pluck: object not found -- trying to create loop to scrape data from multiple PDFs with uniform formatting

Thanks to other articles on this website, I managed to put together a script that will do the following: Collect PDF file names from directory and put into a list. Start a data frame using target ...
Hana Peri's user avatar
1 vote
0 answers
3k views

How to extract data from messy PDF file with no standard formatting?

I am working on this PDF file to parse the tabular data out of it. I was hoping to use tabula or PyPDF2 to extract tables out of it but the data in PDF is not stored in tables. So, I chose pdfplumber ...
Aamir Khan Maarofi's user avatar
0 votes
1 answer
148 views

Pandas DataFrame combine multi row spanning column

I have a complex scraped dataframe that looks like this: For context, the original data from a PDF looks like so: DataFrame info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 26 entries, ...
user1757703's user avatar
  • 3,015