Why Is HTML Returned By Requests Different From The Real Page HTML?

January 30, 2023 Post a Comment

I'm trying to scrape a webpage for getting some data to work with, one of the web pages I want to scrape is this one https://www.etoro.com/people/sparkliang/portfolio, the problem

Solution 1:

It happens because the page uses dynamic javascript to create the DOM elements. So you won't be able to accomplish it using requests. Instead you should use selenium with a webdriver and wait for the elements to be created before scraping.

You can try downloading ChromeDriver executable here. And if you paste it in the same folder as your script you can run:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--headless")
chrome_driver = os.getcwd() + "\\chromedriver.exe" # CHANGE THIS IF NOT SAME FOLDER
driver = webdriver.Chrome(options=chrome_options, executable_path=chrome_driver)

url = 'https://www.etoro.com/people/sparkliang/portfolio'
driver.get(url)
html_text = driver.page_source

jobs = WebDriverWait(driver, 20).until(
    EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'p.i-portfolio-table-hat-fullname'))
)
for job in jobs:
    print(job.text)

Here we use selenium with WebDriverWait and EC to ensure that all the elements wil exist when we try to scrape the info we're looking for.

Outputs

Facebook
Apple
Walt Disney
Alibaba
JD.com
Mastercard
...

Python Freelancers

Why Is HTML Returned By Requests Different From The Real Page HTML?

Solution 1:

Outputs

Post a Comment for "Why Is HTML Returned By Requests Different From The Real Page HTML?"