explicit waits in selenium and phantomjs with python

Question

I'm trying to scrape some data off of this site, and many other "wines" on this site, and am using selenium to do so as its a JS site. however, I'm finding that my code only sometimes works and other times it does not return any values even though nothing is changing.

I think I should use explicit waits with selenium to overcome this challenge, however I'm not sure how to integrate them, so any guidance on doing so would be helpful!

my code is

def ct_content(url):
    browser = webdriver.PhantomJS()
    browser.get(url)
    wait = WebDriverWait(driver, 10)
    html = browser.page_source
    html = lxml.html.fromstring(html)
    try:
        content = html.xpath('//a[starts-with(@href, "list.asp?Table=List")]/text()')
        browser.quit()
        return content
    except:
        browser.quit()
        return False

Thanks!

Can you share several links and an element that you want to scrape on all those pages? — Andersson
– Andersson, Commented Jul 28, 2017 at 19:44
cellartracker.com/wine.asp?iWine=901787, cellartracker.com/wine.asp?iWine=709965, cellartracker.com/wine.asp?iWine=1912334, here are a few of the pages. what I want is '//a[starts-with(@href, "list.asp?Table=List")]/text() which is the table on each page that shows, Vintage, type, producer, Variety, designation, etc. — user7910719
– user7910719, Commented Jul 28, 2017 at 19:58

Andersson · Accepted Answer · 2017-07-28 20:45:39Z

2

Try to use more specific XPath:

//ul[@class="twin_set_list"]//a/text()

Also there is no need to use lxml. Simply try:

from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC

data = [link.get_attribute('textContent') for link in wait(browser, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//ul[@class="twin_set_list"]//a')))]

edited Jul 28, 2017 at 20:45

answered Jul 28, 2017 at 20:12

Andersson

52.8k18 gold badges83 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

user7910719 Over a year ago

the issue with not using waits is that content is only returned sometimes and other times it just returns an empty list for me.

Andersson Over a year ago

Did you try above code? no kind of waits can help you to solve the issue as required data is already in initial HTML source...

user7910719 Over a year ago

yeah I tried it, worked first time, second time it returned an empty list

Andersson Over a year ago

second time == on another page?

Andersson Over a year ago

Hmm... This is weird. Ok. Try updated answer with explicitWait applied

|

Andersson · Accepted Answer · 2017-07-28 20:35:49Z

0

It looks like you never actually use the implicit wait. This is how I would write script with an explicit wait.

def ct_content(url):
    browser = webdriver.PhantomJS()
    browser.get(url)
    wait = WebDriverWait(browser, 10)
    try:
        content = wait.until(EC.element_to_be_clicable((By.XPATH, '//a[starts-with(@href, "list.asp?Table=List")]')))
        browser.quit()
        return content.text
    except:
        browser.quit()
        return False

Also, the way to set implicit waits is:

browser.implicitly_wait(10) # seconds

edited Jul 28, 2017 at 20:35

Andersson

52.8k18 gold badges83 silver badges133 bronze badges

answered Jul 28, 2017 at 19:47

TitusLucretius

1816 bronze badges

4 Comments

Andersson Over a year ago

It also looks like you never use ExplicitWait... Your code is wrong!

TitusLucretius Over a year ago

My bad, I changed the non existent driver variable to browser, and added the missing closing paren. Is this any better? If not what's happening?

TitusLucretius Over a year ago

@Andersson content = wait.until(...) uses the explicit wait, no?

Andersson Over a year ago

content = wait.until(...) looks like ExplicitWait... Can you tell us what is xpath() method and how it should work?

Collectives™ on Stack Overflow

explicit waits in selenium and phantomjs with python

2 Answers 2

15 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

15 Comments

4 Comments

Related