Skip to content Skip to sidebar Skip to footer

Extract Text From Website Source Code

I want to extract info from an website link: http://www.website.com There is a string that appears few times: 'STRING TO CAPTURE', but I want to capture the FIRST time appears. It

Solution 1:

Download and Install BeautifulSoup then

html = urllib.urlopen('http://www.website.com').read()
soup = BeautifulSoup.BeautifulSoup(html)
texts = soup.findAll(text=True)

defget_stuff(element):
    if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
        returnFalseelif re.match('<!--.*-->', str(element)):
        returnFalsereturnTrue

visible_texts = filter(get_stuff, texts)

source - BeautifulSoup Grab Visible Webpage Text

Post a Comment for "Extract Text From Website Source Code"