Download pdf in memory python

Question

I want to open a pdf in my Python program. So far that works.

existing_pdf = PdfFileReader(file(path_to_pdf, "rb"))

Right now I open the pdf from my local disk, but I want it to fetch the pdf from the internet, instead of opening it from my local drive. Note that I don't wish to save the existing_pdf, once I fetched it from the internet I will manipulate it and then save it.

I think I need BytesIO + urllib2, but I cannot figure it out, can somebody help me?

So lets say I want to create the variable: existing_pdf with content http://tug.ctan.org/tex-archive/macros/latex/contrib/logpap/example.pdf in it, but I don't wish to download that file first to the disk and then open it. I want to download it 'in memory' and create the variable existing_pdf, which I can later modify in my program.

EDIT:

  response=urllib2.urlopen("URL")
  pdf_file = BytesIO(response.read())

  existing_pdf = PdfFileReader(pdf_file)

It simply hangs and never finishes PdfFileReader(pdf_file)

  ....
  existing_pdf = PdfFileReader(pdf_file)
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 374, in __init__
  self.read(stream)
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 705, in read
  line = self.readNextEndLine(stream)
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 870, in readNextEndLine
  line = x + line

systemjack · Accepted Answer · 2017-03-02 09:04:37Z

11

Did you try the requests package?

import requests
from StringIO import StringIO
r = requests.get(URL)
pdf_file = StringIO(r.content)
existing_pdf = PdfFileReader(pdf_file)

This worked for me:

import os
import urllib2
from io import BytesIO
URL = "http://tug.ctan.org/tex-archive/macros/latex/contrib/logpap/example.pdf"
response=urllib2.urlopen(URL)
p = BytesIO(response.read())
p.seek(0, os.SEEK_END)
print p.tell()
# 79577

edited Mar 2, 2017 at 9:04

answered Mar 2, 2017 at 8:26

systemjack

3,0251 gold badge21 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Bosiwow Over a year ago

Yeah, just tried that and that worked!!! But I don't know why the urllib2 doesn't work.

systemjack Over a year ago

Looks like it should have. I find requests to be less finicky.

Danny Vu Over a year ago

for the second example, in Python3 you want to import urllib not urllib2 (deprecated) and the call would be response=urllib.request.urlopen(URL)

Sai · Accepted Answer · 2021-07-08 10:56:19Z

0

import os
from urllib.request import urlopen
from io import BytesIO
URL = "http://tug.ctan.org/tex-archive/macros/latex/contrib/logpap/example.pdf"
response=urlopen(URL)
p = BytesIO(response.read())
p.seek(0, os.SEEK_END)
print(p.tell())

urllib2 didnt work in 2021. Use the example above.

answered Jul 8, 2021 at 10:56

Sai

11 bronze badge

Collectives™ on Stack Overflow

Download pdf in memory python

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related