Skip to content Skip to sidebar Skip to footer

Does Beautifulsoup .select() Method Support Use Of Regex?

Suppose I want to parse a html using BeautifulSoup and I wanted to use css selectors to find specific tags. I would 'soupify' it by doing from bs4 import BeautifulSoup soup = Beaut

Solution 1:

The soup.select() function only supports CSS syntax; regular expressions are not part of that.

You can use such syntax to match attributes ending with text:

soup.select('#abc a[href$="xyz"]')

See the CSS attribute selectors documentation over on MSDN.

You can always use the results of a CSS selector to continue the search:

for element in soup.select('#abc'):
    child_elements = element.find_all(href=re.compile('^http://example.com/\d+.html'))

Note that, as the element.select() documentation states:

This is a convenience for users who know the CSS selector syntax. You can do all this stuff with the Beautiful Soup API. And if CSS selectors are all you need, you might as well use lxml directly: it’s a lot faster, and it supports more CSS selectors. But this lets you combine simple CSS selectors with the Beautiful Soup API.

Emphasis mine.

Post a Comment for "Does Beautifulsoup .select() Method Support Use Of Regex?"