Newest 'pdf-scraping+dataframe' Questions

0 votes

1 answer

136 views

PDF Scraping - All Objects Passed were None

I am attempting to create a simple pdf scraper using pandas and pdfquery. I want to take the data I need from each page of the PDF by using the xml coordinates, put it into a dataframe and then save ...

Andrew Martin

1

asked Oct 24, 2023 at 18:08

0 votes

0 answers

49 views

Error in pluck: object not found -- trying to create loop to scrape data from multiple PDFs with uniform formatting

Thanks to other articles on this website, I managed to put together a script that will do the following: Collect PDF file names from directory and put into a list. Start a data frame using target ...

Hana Peri

1

asked Dec 19, 2022 at 21:01

1 vote

0 answers

3k views

How to extract data from messy PDF file with no standard formatting?

I am working on this PDF file to parse the tabular data out of it. I was hoping to use tabula or PyPDF2 to extract tables out of it but the data in PDF is not stored in tables. So, I chose pdfplumber ...

Aamir Khan Maarofi

157

asked Dec 14, 2021 at 12:33

0 votes

1 answer

148 views

Pandas DataFrame combine multi row spanning column

I have a complex scraped dataframe that looks like this: For context, the original data from a PDF looks like so: DataFrame info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 26 entries, ...

user1757703

3,015

asked May 15, 2020 at 17:48

Collectives™ on Stack Overflow

All Questions

PDF Scraping - All Objects Passed were None

Error in pluck: object not found -- trying to create loop to scrape data from multiple PDFs with uniform formatting

How to extract data from messy PDF file with no standard formatting?

Pandas DataFrame combine multi row spanning column

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags