How Do I Control Selenium Pdf And Excel Files Download Behavior?
I want to download all the tender documents from this url 'http://www.ha.org.hk/haho/ho/bssd/T18G014Pc.htm' I'm using selenium to go through each tender links and download the file
Solution 1:
Try requests and beautifulsoup to download all documents:
import requests
from bs4 import BeautifulSoup
import re
base_url = "http://www.ha.org.hk"
tender = "T18G014Pc"
with requests.Session() as session:
r = session.get(f"{base_url}/haho/ho/bssd/{tender}.htm")
# get all documents links
docs = BeautifulSoup(r.text, "html.parser").select("a[href]")
for doc in docs:
href = doc.attrs["href"]
name = doc.text
print(f"name: {name}, href: {href}")
# open document page
r = session.get(href)
# get file path
file_path = re.search("(?<=window.open\\(')(.*)(?=',)", r.text).group(0)
file_name = file_path.split("/")[-1]
# get file and save
r = session.get(f"{base_url}/{file_path}")
with open(file_name, 'wb') as f:
f.write(r.content)
Baca Juga
- What Is The Difference In Accessing Cloudflare Website Using Chromedriver/chrome In Normal/headless Mode Through Selenium Python
- Python Authenticate And Launch Private Page Using Webbrowser, Urllib And Cookiejar
- Obnoxious Cryptographydeprecationwarning Because Of Missing Hmac.compare_time Function Everywhere
Post a Comment for "How Do I Control Selenium Pdf And Excel Files Download Behavior?"