Remove 'urllib.error.httperror: Http Error 302:' From Urlreq(url)
Hey guys what's up? :) I'm trying to scrape a website with some url parameters. If I use url1, url2, url3 it WORKS properly and it prints me the regular output I want (html) ->
Solution 1:
If use requests
package and add in the user agent in the headers, it looks like it's getting 200
response for all 4 of those links. So try adding in the user agent headers:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
import requests
from bs4 import BeautifulSoup as soup
# create urls
url1 = 'https://en.titolo.ch/sale'
url2 = 'https://en.titolo.ch/sale?limit=108'
url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'
url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
url_list = [url1, url2, url3, url4]
for url in url_list:
# opening up connection on each url, grabbing the page
response = requests.get(url, headers=headers)
print (response.status_code)
Output:
200
200
200
200
So:
importrequestsheaders= {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
url = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'
r = requests.get(url, headers=headers)
html = r.text
print(html)
Post a Comment for "Remove 'urllib.error.httperror: Http Error 302:' From Urlreq(url)"