Can't Find The Right Way To Grab Part Numbers From A Webpage Using Requests
Solution 1:
The difficulty for the driver is to click to the 'Product list' button so I found a solution:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
from selenium import webdriver
import time
class NoPartsNumberException(Exception):
pass
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.festo.com/cat/en-id_id/products_ADNH")
wait.until(ec.frame_to_be_available_and_switch_to_it(wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, "object")))))
wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()
driver.switch_to.default_content()
wait.until(ec.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[@name='CamosIF']")))
endtime = time.time() + 30
while True:
try:
if time.time() > endtime:
raise NoPartsNumberException('No parts number found')
product_list = wait.until(ec.element_to_be_clickable((By.XPATH, "//div[@id='f24']")))
product_list.click()
part_numbers_elements = wait.until(ec.visibility_of_all_elements_located((By.XPATH, "//div[contains(@id, 'v471')]")))
break
except (TimeoutException, StaleElementReferenceException):
pass
part_numbers = [p.text for p in part_numbers_elements[1:]]
print(part_numbers)
driver.close()
In this way the driver clicks on the 'Product list' button until it opens the window containing the part numbers and you have to wait much less than 10 seconds as in your code with the hardcoded time sleep
Solution 2:
To grab part numbers from the webpage using Selenium you need to:
Induce WebDriverWait for the
object
frame to be available and switch to it.Induce WebDriverWait for the desired element to be clickable and click on the Accept all cookies.
Switch back to the
default_content()
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the
staleness_of()
of the stale element.Click on the tab with text as Product list using
execute_script()
.You can use the following Locator Strategies:
driver.get('https://www.festo.com/cat/en-id_id/products_ADNH') WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME,"object"))) WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.btn.btn-primary#accept-all-cookies"))).click() driver.switch_to.default_content() WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#CamosIFId"))) WebDriverWait(driver, 20).until(EC.staleness_of(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Product list']"))))) driver.execute_script("arguments[0].click();", WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[text()='Product list']")))) print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='ah']/img//following::div[2]")))]) driver.quit()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Console Output:
['539691', '539692', '539693', '539694']
Reference
You can find a couple of relevant discussions in:
Solution 3:
I believe you have covered the iframe
and WebDriverWait
concept well.
The site seems to re-render the content a few times prior to be able to actual get the right element and click on it. Hence why you had to add a sleep of 10 seconds.
There is a believe that EC
must be used when using WebDriverWait
. EC
is only a bunch of class helpers to retrieve an element with some defined properties (i.e visible, hidden, clickable...)
In your case, ec.visibility_of_all_elements_located
was a good choice. But once the element is retrieve, the DOM is re-rentered and you will generate a StaleElementReferenceException
if you use the WebElement
click method. Also believe that the click using JS
will just be ignored as the passed element is no longer present.
Since until()
can be used to determine when to return element, why not utilize it and create our own EC class:
class SelectProductTab(object):
def __init__(self, locator):
self.locator = locator
self._selected_background_image = 'url("IMG?i=ec2a883936d53541a030c2ddb511e7e8&s=p")'
def __call__(self, driver):
els = driver.find_elements(*self.locator)
if len(els) > 0:
els[0].click()
else:
return False
return els[0] if self.__is_selected(els[0]) else False
def __is_selected(self, el):
return self._selected_background_image in el.get_attribute('style')
This class will do the following:
- Retrieve the element
- Click on it
- Ensure the desired tab is selected. Basically ensure the click did work
- Upon the tab being selected, returns the element back to the caller
One part is not handled, as WebDriverWait
already supports it, it is to handle exception. In your case, you will be facing StaleElementReferenceException
.
wait = WebDriverWait(driver, 30, ignored_exceptions=(StaleElementReferenceException, ))
Then call until()
with your own implementation of an EC class:
wait.until(SelectProductTab((By.CSS_SELECTOR, "[id='r17'] > [id='f24']")))
Full code:
with webdriver.Chrome(ChromeDriverManager().install(), options=options) as driver:
driver.get(link)
wait = WebDriverWait(driver, 15)
wait.until(EC.frame_to_be_available_and_switch_to_it(
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "object")))))
wait.until(EC.presence_of_element_located(
(By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()
driver.switch_to.default_content()
wait.until(EC.frame_to_be_available_and_switch_to_it(
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe#CamosIFId")))))
# Sleep was removed, click is now handled inside our own EC class + will ensure the tab is selected
wait = WebDriverWait(driver, 30, ignored_exceptions=(StaleElementReferenceException, ))
wait.until(SelectProductTab((By.CSS_SELECTOR, "[id='r17'] > [id='f24']")))
for elem in wait.until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[data-ctcwgtname='tabTable'] [id^='v471_']")))[1:]:
print(elem.text)
Output:
539691
539692
539693
539694
Note to import the following import:
from selenium.common.exceptions import StaleElementReferenceException
Post a Comment for "Can't Find The Right Way To Grab Part Numbers From A Webpage Using Requests"