Unable To Use Two Threads To Execute Two Functions Within A Script
I've created a scraper using python in combination with Thread to make the execution faster. The scraper is supposed to parse all the links available within the webpage ended with
Solution 1:
Try to update alphabetical_links
with its own Threads:
import requests
import threading
from lxml import html
main_url = "https://www.houzz.com/proListings/letter/{}"defalphabetical_links(mainurl):
response = requests.get(mainurl).text
tree = html.fromstring(response)
links_on_page = [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]
threads = []
for link in links_on_page:
thread = threading.Thread(target=sub_links, args=(link,))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
defsub_links(process_links):
response = requests.get(process_links).text
root = html.fromstring(response)
for container in root.cssselect(".proListing"):
try:
name = container.cssselect("h2 a")[0].text
except Exception: name = ""try:
phone = container.cssselect(".proListingPhone")[0].text
except Exception: phone = ""print(name, phone)
if __name__ == '__main__':
linklist = []
for link in [main_url.format(chr(page)) for page inrange(97,123)]:
thread = threading.Thread(target=alphabetical_links, args=(link,))
thread.start()
linklist+=[thread]
for thread in linklist:
thread.join()
Note that this is just an example of how to manage "inner Threads". Because of numerous threads that are starting at the same time your system might fail to start some of them due to lack of resources and you will get RuntimeError: can't start new thread
exception. In this case you should try to implement ThreadPool
Post a Comment for "Unable To Use Two Threads To Execute Two Functions Within A Script"