How Do I Scrape ::before Element In A Website Using Selenium Python
Solution 1:
You don't need selenium. The instructions to apply the content which gives the pseudo before elements their values is carried in the css style instructions:
Here, the 2/3 letter strings after the .icon-
e.g. acb
map to the span
elements which house your before
content. The values after \9d0
are + 1 of the actual value shown. You can create a dictionary from these pairs of values (with the adjustment) to decode the number at each before
from the span
class value.
Example of how 2/3 letter strings map to content:
My method is perhaps a little verbose as I am not that familiar with Python but the logic should be clear.
import requests
import re
from bs4 import BeautifulSoup
url = 'https://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?xid=QmFuZ2Fsb3JlIEJhbmsgRXhhbSBUdXRvcmlhbHM='
res = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(res.content, 'lxml')
cipherKey = str(soup.select('style[type="text/css"]')[1])
keys = re.findall('-(\w+):before', cipherKey, flags=0)
values = [int(item)-1for item in re.findall('9d0(\d+)', cipherKey, flags=0)]
cipherDict = dict(zip(keys,values))
cipherDict[list(cipherDict.keys())[list(cipherDict.values()).index(10)]] = '+'
decodeElements = [item['class'][1].replace('icon-','') for item in soup.select('.telCntct span[class*="icon"]')]
telephoneNumber = ''.join([str(cipherDict.get(i)) for i in decodeElements])
print(telephoneNumber)
Solution 2:
You can also get the :before
content from the computed style:
chars = driver.execute_script("return [...document.querySelectorAll('.telCntct a.tel span')].map(span => window.getComputedStyle(span,':before').content)")
But in this case you're left with weird unicode content that you then have to map to numbers.
Post a Comment for "How Do I Scrape ::before Element In A Website Using Selenium Python"