1

I have to scrape a website that uses javascript to display content. I have to use standard libs only as I will run this script on a server where there is not any browser. I have found selenium but it requires a browser that in my case is not possible to install.

Any idea or solution?

5
  • Why don't you rely on Scrapy for doing the task? Avoid reinventing the wheel.
    – narko
    Commented Sep 18, 2015 at 7:11
  • You can use Requests library.
    – Vikas Ojha
    Commented Sep 18, 2015 at 7:12
  • Scarpy , Beautifulsoup are pretty good libraries for the same Commented Sep 18, 2015 at 7:41
  • 1
    These modules (Requests,Beautifulsoup) could not do it Commented Sep 18, 2015 at 7:59
  • @Shafiq Do you mind if I ask why requests and bs4 couldn't complete the task? These would have been my first go-to solutions.
    – pmccallum
    Commented Sep 18, 2015 at 8:09

2 Answers 2

2

Have a look at Ghost.py http://jeanphix.me/Ghost.py/. It doesn't require a browser.

pip install Ghost.py

from ghost import Ghost
ghost = Ghost()
page, resources = ghost.open('http://stackoverflow.com/')
1

You didn't mention anything about how the website is using javascript, but if it uses AJAX requests that are triggered after any kind of user interaction, you will need to use something like Selenium to automatize that behaviour. Here, you can find a short tutorial of how to scrape with Scrapy + Selenium. This of course requires a browser previously installed in your machine.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.