1

I'm a bit new to Selenium and am trying to build a webscraper that can select a dropdown menu and then select specific options from the menu. I've built the following code and it was working at one point, but now seems to have stopped working. Ideally, I would like to be able to select by years from the dropdown menu.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.common.by import By
import pandas as pd
import csv
import sqlite3
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select


service = Service()
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service, options=options)
#ministry of defense
url = 'https://www.homeaffairs.gov.au/news-media/archive#'
driver.get(url)

#parse html using beautiful soup
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')

a = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, '/html/body/form/div[3]/div[2]/div[2]/div[3]/main/div/div[5]/div[3]/div/div[2]/div[3]/div[2]/div/div/div[1]/ha-news-archive/div/ha-news-filters/div/div/div/div/div/div[1]/div/select')))
select = Select(driver.find_element(By.XPATH, '/html/body/form/div[3]/div[2]/div[2]/div[3]/main/div/div[5]/div[3]/div/div[2]/div[3]/div[2]/div/div/div[1]/ha-news-archive/div/ha-news-filters/div/div/div/div/div/div[1]/div/select'))
#select by visible text
select.select_by_visible_text('2024')

As you can see, I have tried to use the WebDriverWait function in case there are some issues with the page loading and the element not being yet visible, but I still receive the following error, which leads me to believe that I am not able to locate the element for the 2024 year option:

--> 137     raise NoSuchElementException(f"Could not locate element with visible text: {text}")

NoSuchElementException: Message: Could not locate element with visible text: 2024; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception

I've also tried to select by values of the dropdown menu, tried other years as well, and no luck. I'm not sure why this code would have stopped working all of a sudden, but would really appreciate any insights.

4
  • That xpath looks horribly brittle. Is there really no better way to identify the element? Commented Sep 4 at 12:46
  • I've tried replacing the select element with the following select = Select(driver.find_element(By.CLASS_NAME, 'form_control ng-valid ng-dirty ng-touched')) and this also seems to not work. Commented Sep 4 at 12:59
  • Selenium doesn't support compound class names like that. Try: 'form_control.ng-valid.ng-dirty.ng-touched' Commented Sep 4 at 13:28
  • Thanks for clarifying on the compound names for class -- that's helpful to know. Unfortunately, I did not have success with this. Commented Sep 4 at 14:21

2 Answers 2

2

The problem is that the page finishes loading the HTML but background processes are still running that fill in the Year dropdown and other parts of the page. You can see this if you watch the page load. You can even put a breakpoint on the .select_by_* line, watch the page finish loading the dropdown, and then continue the script and it works.

To fix this, I added a wait until the Year dropdown options > 1.

NOTE: I also changed your XPath to something more dynamic. It finds the "Year" LABEL and then locates the sibling SELECT. This should be pretty resilient unless they significantly change the page.

This is working code.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.wait import WebDriverWait

url = 'https://www.homeaffairs.gov.au/news-media/archive'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)

wait = WebDriverWait(driver, 10)

year = Select(wait.until(EC.visibility_of_element_located((By.XPATH, "//label[text()='Year']/following-sibling::select"))))
wait.until(lambda driver: len(year.options) > 1)
year.select_by_value('2024')
Sign up to request clarification or add additional context in comments.

Comments

2

The webpage is build using Javascript & AJAX calls. Though visually the webpage loading gets completed but the AJAX call at the background are still in process till the Ask a question chatbot is rendered.


Solution

To select the year 2024 from the dropdown you need to:

  • Close the banner header so the viewport is maximized.
  • Wait for the visibility of the chatbot so all the desired elements are completely rendered.
  • Construct a logical xpath which identifies the tag uniquely.
  • Select the specific year 2024

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

driver.get("https://www.homeaffairs.gov.au/news-media/archive#")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.header-alert-hide-btn>i"))).click()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button[title='Start Chat']")))
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//label[contains(., 'Year')]//following-sibling::select")))).select_by_visible_text('2024')

Screenshot: enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.