Skip to content Skip to sidebar Skip to footer

Unable To Use Two Threads To Execute Two Functions Within A Script

I've created a scraper using python in combination with Thread to make the execution faster. The scraper is supposed to parse all the links available within the webpage ended with

Solution 1:

Try to update alphabetical_links with its own Threads:

import requests
import threading
from lxml import html

main_url = "https://www.houzz.com/proListings/letter/{}"defalphabetical_links(mainurl):
    response = requests.get(mainurl).text
    tree = html.fromstring(response)
    links_on_page = [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]
    threads = []
    for link in links_on_page:
        thread = threading.Thread(target=sub_links, args=(link,))
        thread.start()
        threads.append(thread)
    for thread in threads:
        thread.join()


defsub_links(process_links):
    response = requests.get(process_links).text
    root = html.fromstring(response)

    for container in root.cssselect(".proListing"):
        try:
            name = container.cssselect("h2 a")[0].text
        except Exception: name = ""try:
            phone = container.cssselect(".proListingPhone")[0].text
        except Exception: phone = ""print(name, phone)

if __name__ == '__main__':
    linklist = []
    for link in [main_url.format(chr(page)) for page inrange(97,123)]:
        thread = threading.Thread(target=alphabetical_links, args=(link,))
        thread.start()
        linklist+=[thread]


    for thread in linklist:
        thread.join()

Note that this is just an example of how to manage "inner Threads". Because of numerous threads that are starting at the same time your system might fail to start some of them due to lack of resources and you will get RuntimeError: can't start new thread exception. In this case you should try to implement ThreadPool

Solution 2:

You can start more threads the same way you started the first one

from threading import Thread

t1 = Thread(target=alphabetical_links, kwargs={
    'mainurl':     link,
})
t1.start()

t2 = Thread(target=sub_links, kwargs={
    'process_links':     link,
})
t2.start()

Post a Comment for "Unable To Use Two Threads To Execute Two Functions Within A Script"