Web Scraping From .aspx Site Using Python
I'm attempting to scrape some data from this site: https://fortress.wa.gov/esd/file/warn/Public/SearchWARN.aspx I am able to get the first 11 pages using my method but for some rea
Solution 1:
You can use this example to get all pages (total 67) from the site (it gets all <input>
values dynamically - so it gets correct __VIEWSTATE
etc.):
import requests
from bs4 import BeautifulSoup
url = 'https://fortress.wa.gov/esd/file/warn/Public/SearchWARN.aspx'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
defget_data(soup, page_num):
data = {}
for i in soup.select('input'):
data[i['name']] = i.get('value', '')
del data['ucPSW$btnSearchCompany']
data['__EVENTTARGET'] = 'ucPSW$gvMain'
data['__EVENTARGUMENT'] = 'Page${}'.format(page_num)
data['__LASTFOCUS'] = ''return data
page = 1whileTrue:
print('Page {}...'.format(page))
total = 1for total, tr inenumerate(soup.select('#ucPSW_gvMain > tr:not(:has(table)):has(td)'), 1):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<3}{:<50}{:<25}{:<15}{:<15}{:<15}{:<15}{:<15}'.format(total, *tds))
if total % 15:
break
page += 1
soup = BeautifulSoup( requests.post(url, get_data(soup, page)).content, 'html.parser' )
Prints:
Page 1...
1 Safran Cabin Materials, LLC Marysville and Newport 6/23/2020 85 Layoff Permanent 6/24/2020
2 Swissport Fueling SeaTac 5/8/2020 69 Layoff Permanent 6/19/2020
3 Swissport USA, Inc SeaTac 5/22/2020 62 Layoff Permanent 6/19/2020
4 Swissport USA, Inc SeaTac 3/20/2020 167 Layoff Temporary 6/19/2020
5 Tool Gauge and Machine Works Tacoma 6/17/2020 59 Layoff Permanent 6/18/2020
6 Hyatt Corporation Motif Seattle Seattle 3/14/2020 91 Layoff Temporary 6/18/2020
7 Jacobsen Daniel's Enterprise, Inc Tacoma 6/12/2020 1 Layoff Permanent 6/18/2020
8 Benchmark Stevenson, LLC d/b/a Skamania Lodge Stevenson 3/18/2020 185 Layoff Temporary 6/17/2020
9 Seattle Art Museum Seattle 7/5/2020 76 Layoff Temporary 6/16/2020
10 Chihuly Garden & Glass Seattle 3/21/2020 97 Layoff Temporary 6/16/2020
11 Seattle Center Seattle 3/21/2020 182 Layoff Temporary 6/16/2020
12 Sekisui Aerospace Renton and Sumner 6/12/2020 111 Layoff Permanent 6/15/2020
13 Pioneer Human Services Seattle 8/14/2020 59 Layoff Permanent 6/15/2020
14 Crista Senior Living Shoreline 8/16/2020 156 Closure Permanent 6/15/2020
15 Hyatt Corporation / Hyatt Regency Bellevue Bellevue 3/15/2020 223 Layoff Temporary 6/15/2020
Page 2...
1 Toray Composite Materials America, Inc Tacoma 8/8/2020 146 Layoff Permanent 6/12/2020
2 Embassy Suites Seattle Bellevue Seattle 6/1/2020 57 Layoff Temporary 6/12/2020
3 Triumph Aerospace Structures Spokane 6/15/2020 12 Layoff Permanent 6/11/2020
4 Hyatt Corporation / Hyatt Regency Lake Washington Renton 6/30/2020 129 Layoff Temporary 6/9/2020
5 Lamb Weston, Inc Connell, WA 6/15/2020 360 Layoff Temporary 6/8/2020
6 Lamb Weston, Inc Warden 6/15/2020 300 Layoff Temporary 6/8/2020
...and so on.
Post a Comment for "Web Scraping From .aspx Site Using Python"