Skip to content Skip to sidebar Skip to footer

How To Scrape Specific Ids From A Webpage

I need to do some real estate market research and for this in need the prices, and other values from new houses. So my idea was to go on the website where i get the information.

Solution 1:

Something like this? There are 68 keys in a dictionary that are ids. I use regex to grab the same script as you are after and trim of an unwanted character, then load with json.loads and access the json object as shown in image at bottom.

import requests
import json
from bs4 import BeautifulSoup as bs
import re

res = requests.get('https://www.immobilienscout24.de/Suche/S-T/Wohnung-Kauf/Nordrhein-Westfalen/Duesseldorf/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/true?enteredFrom=result_list')
soup = bs(res.content, 'lxml')
r = re.compile(r'resultListModel:(.*)')
data = soup.find('script', text=r).text
script = r.findall(data)[0].rstrip(',')
#resultListModel: 
results = json.loads(script)
ids = list(results['searchResponseModel']['entryInformation'].keys())
print(ids)

Ids:


Since website updated:

import requests
import json
from bs4 import BeautifulSoup as bs
import re

res = requests.get('https://www.immobilienscout24.de/Suche/S-T/Wohnung-Kauf/Nordrhein-Westfalen/Duesseldorf/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/true?enteredFrom=result_list')
soup = bs(res.content, 'lxml')
r = re.compile(r'resultListModel:(.*)')
data = soup.find('script', text=r).text
script = r.findall(data)[0].rstrip(',')
results = json.loads(script)
ids = [item['@id'] for item in results['searchResponseModel']['resultlist.resultlist']['resultlistEntries'][0]['resultlistEntry']]
print(ids)

Post a Comment for "How To Scrape Specific Ids From A Webpage"