Extract Text From Website Source Code
I want to extract info from an website link: http://www.website.com There is a string that appears few times: 'STRING TO CAPTURE', but I want to capture the FIRST time appears. It
Solution 1:
Download and Install BeautifulSoup then
html = urllib.urlopen('http://www.website.com').read()
soup = BeautifulSoup.BeautifulSoup(html)
texts = soup.findAll(text=True)
defget_stuff(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
returnFalseelif re.match('<!--.*-->', str(element)):
returnFalsereturnTrue
visible_texts = filter(get_stuff, texts)
Post a Comment for "Extract Text From Website Source Code"