How To Scrape Specific Ids From A Webpage

March 17, 2024 Post a Comment

I need to do some real estate market research and for this in need the prices, and other values from new houses. So my idea was to go on the website where i get the information.

Solution 1:

Something like this? There are 68 keys in a dictionary that are ids. I use regex to grab the same script as you are after and trim of an unwanted character, then load with json.loads and access the json object as shown in image at bottom.

import requests
import json
from bs4 import BeautifulSoup as bs
import re

res = requests.get('https://www.immobilienscout24.de/Suche/S-T/Wohnung-Kauf/Nordrhein-Westfalen/Duesseldorf/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/true?enteredFrom=result_list')
soup = bs(res.content, 'lxml')
r = re.compile(r'resultListModel:(.*)')
data = soup.find('script', text=r).text
script = r.findall(data)[0].rstrip(',')
#resultListModel: 
results = json.loads(script)
ids = list(results['searchResponseModel']['entryInformation'].keys())
print(ids)

Ids:

Since website updated:

import requests
import json
from bs4 import BeautifulSoup as bs
import re

res = requests.get('https://www.immobilienscout24.de/Suche/S-T/Wohnung-Kauf/Nordrhein-Westfalen/Duesseldorf/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/true?enteredFrom=result_list')
soup = bs(res.content, 'lxml')
r = re.compile(r'resultListModel:(.*)')
data = soup.find('script', text=r).text
script = r.findall(data)[0].rstrip(',')
results = json.loads(script)
ids = [item['@id'] for item in results['searchResponseModel']['resultlist.resultlist']['resultlistEntries'][0]['resultlistEntry']]
print(ids)

Python Freelancers

How To Scrape Specific Ids From A Webpage

Solution 1:

Post a Comment for "How To Scrape Specific Ids From A Webpage"