All Questions
Tagged with pdf-scraping dataframe
4 questions
0
votes
1
answer
136
views
PDF Scraping - All Objects Passed were None
I am attempting to create a simple pdf scraper using pandas and pdfquery. I want to take the data I need from each page of the PDF by using the xml coordinates, put it into a dataframe and then save ...
0
votes
0
answers
49
views
Error in pluck: object not found -- trying to create loop to scrape data from multiple PDFs with uniform formatting
Thanks to other articles on this website, I managed to put together a script that will do the following:
Collect PDF file names from directory and put into a list.
Start a data frame using target ...
1
vote
0
answers
3k
views
How to extract data from messy PDF file with no standard formatting?
I am working on this PDF file to parse the tabular data out of it. I was hoping to use tabula or PyPDF2 to extract tables out of it but the data in PDF is not stored in tables. So, I chose pdfplumber ...
0
votes
1
answer
148
views
Pandas DataFrame combine multi row spanning column
I have a complex scraped dataframe that looks like this:
For context, the original data from a PDF looks like so:
DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, ...