Skip to content Skip to sidebar Skip to footer

Download Files Using Python 3.4 From Google Patents

I would like to download (using Python 3.4) all (.zip) files on the Google Patent Bulk Download Page http://www.google.com/googlebooks/uspto-patents-grants-text.html (I am aware th

Solution 1:

As I understand you seek for a command that will simulate leftclicking on file and automatically download it. If so, you can use Selenium. something like:

from selenium import webdriver 
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
profile = FirefoxProfile ()
profile.set_preference("browser.download.folderList",2)
profile.set_preference("browser.download.manager.showWhenStarting",False)
profile.set_preference("browser.download.dir", 'D:\\') #choose folder to download to
profile.set_preference("browser.helperApps.neverAsk.saveToDisk",'application/octet-stream')
driver = webdriver.Firefox(firefox_profile=profile)
driver.get('https://www.google.com/googlebooks/uspto-patents-grants-text.html#2015')
filename = driver.find_element_by_xpath('//a[contains(text(),"ipg150106.zip")]') #use loop to list all zip files
filename.click()

UPDATED! 'application/octet-stream' zip-mime type should be used instead of "application/zip". Now it should work:)

Solution 2:

The html you are downloading is the page of links. You need to parse the html to find all the download links. You could use a library like beautiful soup to do this.

However, the page is very regularly structured so you could use a regular expression to get all the download links:

importrehtml= urllib.request.urlopen(url).read()
links = re.findall('<a href="(.*)">', html)

Post a Comment for "Download Files Using Python 3.4 From Google Patents"