Skip to content Skip to sidebar Skip to footer

Scraping Dynamic Information

I recently started with coding, I use Python and Pycharm. I Installed and imported the needed 'Add-ons' like Selenium. For my first project I tried to get the 'address' information

Solution 1:

If you want the element's address just get the element and print it's text.

driver.get("https://randomstreetview.com/")
wait = WebDriverWait(driver, 10)
elem = wait.until(EC.presence_of_element_located((By.ID, "address")))
print(elem.text)

Element

<div id="address">Nordre Ringvej 97, 2600 Glostrup, Dänemark</div>

Outputs

NordreRingvej97,2600 Glostrup,Dänemark

Imports

from selenium.webdriver.common.byimportByfrom selenium.webdriver.support.uiimportWebDriverWaitfrom selenium.webdriver.supportimport expected_conditions asEC

Solution 2:

To print the textvalue you can use either of the following Locator Strategies:

  • Using id and get_attribute("textContent"):

    driver.get("https://randomstreetview.com/#fullscreen")
    print(driver.find_element_by_id("address").get_attribute("textContent"))
    
  • Using css_selector and get_attribute("innerHTML"):

    driver.get("https://randomstreetview.com/#fullscreen")
    print(driver.find_element_by_css_selector("div#address").get_attribute("innerHTML"))
    
  • Using xpath and text attribute:

    driver.get("https://randomstreetview.com/#fullscreen")
    print(driver.find_element_by_xpath("//div[@id='address']").text)  
    

Ideally you need to induce WebDriverWait for the presence_of_element_located() and you can use either of the following Locator Strategies:

  • Using ID and get_attribute("textContent"):

    print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, "address"))).get_attribute("textContent"))
    
  • Using CSS_SELECTOR and text attribute:

    print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div#address"))).text)
    
  • Using XPATH and get_attribute():

    print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//div[@id='address']"))).get_attribute("innerHTML"))
    
  • Console Output:

    value
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.uiimportWebDriverWaitfrom selenium.webdriver.common.byimportByfrom selenium.webdriver.supportimport expected_conditions asEC
  • Console Output:

    Ciudad Pérdida 10, La Sabana, 39799 Acapulco, Guerrero, Mexico
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


References

Link to useful documentation:

Solution 3:

To piggy back on Arundeep Chohan's answer. The reason that you are unable to get the address is because it is a hidden element.

check out this post Python Selenium: Finds h1 element but returns empty text string

TLDR; "text property allow you to get text from only visible elements while textContent attribute also allow to get text of hidden one..."

This code also works using CSS selectors

element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div#address')))

print(element.get_attribute('textContent'))

Post a Comment for "Scraping Dynamic Information"