Skip to main content

All Questions

0 votes
0 answers
471 views

Extract only the body text of the PDF, not the bulleted points, headings and subheadings using python pdfplumber library

Code import pdfplumber ecdata = "" with pdfplumber.open("XYZ Transcript.pdf") as pdf: for i in range(len(pdf.pages)): print("Page No.: ", i+1) ...
Kituva Ravindran Praveen's user avatar
2 votes
2 answers
2k views

Python PdfMiner - How to get the info on the orientation of each word/sentence included in a pdf?

Target: I want to extract the info on the orientation of each word or sentence from a PDF like the attached one. The reason for this is that i want to keep the text only from the orientation with zero ...
Vagelis's user avatar
  • 66
1 vote
0 answers
49 views

trying to extract data from pdf and make sense of it and upload it to a database

Ive got many PDF's which contain data like name , Address , Contact info , Email Id's and many more details. i am trying to write a program to convert this data into Text file and using different ...
suyash joshi's user avatar
420 votes
13 answers
467k views

Python module for converting PDF to text [closed]

Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate which uses pypdf but the text generated had no space between and was of no use.
cnu's user avatar
  • 37.3k