Skip to main content

All Questions

0 votes
1 answer
411 views

Scrapy script that was supposed to scrape pdf, doc files is not working properly

I am trying to implement a similar script on my project following this blog post here: https://www.imagescape.com/blog/scraping-pdf-doc-and-docx-scrapy/ The code of the spider class from the source: ...
glitchy_itchy's user avatar
6 votes
3 answers
58k views

How to scrape PDFs using Python; specific content only

I am trying to get data from PDFs available on the site https://usda.library.cornell.edu/concern/publications/3t945q76s?locale=en For example, If I look at November 2019 report https://downloads....
Camilia's user avatar
  • 81
0 votes
2 answers
624 views

How to read line by line in pdf file and create a CSV

Here is my pdf I found THIS and I used it to scrap my pdf. 6 BEDROOMS NameAddressUnitSizeKeyRentSq FtMove in DateNotesTenant Prop # Texan 261009 West 26th3076x3$4,6952,1368/15/14$1,000 Bonus (1) ...
Alexxio's user avatar
  • 1,101