Newest 'pdf-scraping+python' Questions

1 vote

0 answers

78 views

PDF Scraping in Python

I am having trouble scraping certain data from PDF files in Python. There are no console errors, but when the CSV is produced, the columns Owner's First Name - Zip Code are either filled with the ...

user29394340

11

asked Jan 29 at 15:15

1 vote

0 answers

24 views

PDF Scraping with Templated Document

I cannot scrape other details from PDF File. Some document scraped all, while others are not. And this is the following issue I am encountering. I am scraping a Sample PDF File. CASE1: Definition and ...

Donna Esperas

11

asked Jul 25, 2024 at 14:28

1 vote

1 answer

32 views

Issue in Pdf download using request module in python

import requests pdf_url = "https://www.alexandrina.sa.gov.au/__data/assets/pdf_file/0028/1619614/Council-Special-Meeting-Agenda-11-June-2024.pdf" pdf_path = 'Test.pdf' response = requests....

Krupesh Pandya

11

asked Jun 14, 2024 at 7:19

3 votes

1 answer

525 views

pdfplumber not picking up column & issue with multiline data

So i'm struggling with two things with a pdf extraction script i've written. The first thing being that the script isn't picking up the last column 'Serial Number' I've boxed the area I'm interested ...

Mark k

153

asked Mar 6, 2024 at 15:10

1 vote

1 answer

286 views

Encoding Issue When Attempting to Convert Hindi Script PDF to CSV in Python

I'm currently attempting to convert a PDF file containing Hindi Devanagari script to a CSV file using the fitz library in Python, but when I read in the text I encounter a strange encoding issue. Here ...

cedratcarlisle

11

asked Mar 4, 2024 at 23:54

0 votes

0 answers

720 views

ModuleNotFoundError: No module named 'langchain'

i tried to extract data from an unstructured pdf file in python Vscode, i searched all solutions in google without any improvement, i struggled with an error when i tried to import LangChain library, ...

joj abd

1

asked Feb 19, 2024 at 16:53

0 votes

1 answer

294 views

Extracted images from pdf, look like rotated, and inverted

quick question, is there some big errors in my code, apart from being dirty? why the extracted images from a pdf using PyMuPDF are looking inverted and upside down? i made some changes to the ...

user40208

1

asked Feb 5, 2024 at 23:45

1 vote

0 answers

80 views

PDF scraping, tabula py - columns do not correspond with "true" values of PDF file

I get stuck again with PDF scraping and observe that columns do not correspond to some of the values that I obtain for those columns. Basically, I want to obtain a CSV file, but first I want to ...

Michael Picazo

15

asked Nov 28, 2023 at 16:28

0 votes

1 answer

136 views

PDF Scraping - All Objects Passed were None

I am attempting to create a simple pdf scraper using pandas and pdfquery. I want to take the data I need from each page of the PDF by using the xml coordinates, put it into a dataframe and then save ...

Andrew Martin

1

asked Oct 24, 2023 at 18:08

1 vote

1 answer

115 views

Pdfminer randomly changes text size when converting pdf to html

An example of the type of pdf I'm trying to scrape. I'm trying to scrape a pdf document for the number of papers, where the names of papers are in a specific font and size (10px). Given that other ...

gamer220

17

asked Oct 6, 2023 at 23:50

0 votes

0 answers

1k views

Why is this code using PyMuPDF not extracting all the images in a PDF?

I'm trying to extract images from an invoice for an equipment order and each time I run the code I only get 4 of 8 or 9 total photos on each page. Are there some PDFs that are just not compatible with ...

Asia Vassos

1

asked Sep 26, 2023 at 19:27

0 votes

1 answer

191 views

Python - Fitz pdf Skimmer - Question on how to return a sentences with keywords

I'm in the process of creating a pdf skimmer that reads a legal document, searches for keywords, returns the individual sentences that the keywords are apart of, then updates a checklist based on the ...

Ravi Thehidden

1

asked Aug 23, 2023 at 18:03

0 votes

0 answers

244 views

Python Tabula: Reading in PDF to Python as Pandas Dataframe

Scraping PDF data from a website, they changed their PDF formatting so I can no longer use my solution that worked for every other PDF. Unsure of an alternative method. Hello everyone, I am trying to ...

jare2620

33

asked Jul 28, 2023 at 17:24

0 votes

0 answers

543 views

Cleaning Unstructured PDF data

Raw Data: Given is a PDF data containing the student placement details of a university. It is in a completely unstructured form and needs to be cleaned up before processing. The Expected CSV file ...

gurukishoreg78

11

asked May 17, 2023 at 14:41

-1 votes

1 answer

283 views

Scrape data from PDF with python but not from a table or a normal te

Hello guys and thank you in advance for helping me. So basically, i am trying scrape data from a pdf. this is the pdf data: what i want to do is extract data from it like that: i tried to do it with ...

tous

37

asked Apr 15, 2023 at 22:55

Collectives™ on Stack Overflow

All Questions

PDF Scraping in Python

PDF Scraping with Templated Document

Issue in Pdf download using request module in python

pdfplumber not picking up column & issue with multiline data

Encoding Issue When Attempting to Convert Hindi Script PDF to CSV in Python

ModuleNotFoundError: No module named 'langchain'

Extracted images from pdf, look like rotated, and inverted

PDF scraping, tabula py - columns do not correspond with "true" values of PDF file

PDF Scraping - All Objects Passed were None

Pdfminer randomly changes text size when converting pdf to html

Why is this code using PyMuPDF not extracting all the images in a PDF?

Python - Fitz pdf Skimmer - Question on how to return a sentences with keywords

Python Tabula: Reading in PDF to Python as Pandas Dataframe

Cleaning Unstructured PDF data

Scrape data from PDF with python but not from a table or a normal te

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags