Skip to content Skip to sidebar Skip to footer

How To Keep Characters With Regular Expressions That I Don't Want To Delete In Python?

I use this code to delete all tag elements in HTML. import re MyString = 'aaa

Radio and television.

very
popular in the world today.&

Solution 1:

You can capture <br> tags separately in group1 and capture any other tag separately and replace the whole match with \1 to retain <br> tags and remove rest other tags. Replace

(?i)(<br\/?>)|<[^>]*>

with \1. Also added (?i) inline modifier (you can also pass re.IGNORECASE as fourth argument in re.sub to make it case-insensitive) to make the regex case insensitive for also matching it with <BR> or <BR/>

Regex Demo

Your updated Python code,

import re
MyString = 'aaa<p>Radio and television.<br></p><p>very<br/> popular <BR>in the <BR/>world today.</p><p>Millions of people watch TV. </p><p>That’s because a radio is very small <span_style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span_style=":_black;">haha100%</span></p>bb'
MyString = re.sub('(?i)(<br/?>)|<[^>]*>', r'\1', MyString)
print(MyString)

Prints the string with br tag only and rest tags removed,

aaaRadio and television.<br>very<br/> popular <BR>in the <BR/>world today.Millions of people watch TV. That’s because a radio is very small 98.2%and it‘s easy to carry. haha100%bb

In another approach, you can also use a negative look ahead to reject tags that are br using this regex,

(?i)<(?!br/?>)[^>]*>

and just replace it with empty string.

Regex Demo using negative lookahead to reject

Python code using negative lookahead regex,

import re
MyString = 'aaa<p>Radio and television.<br></p><p>very<br/> popular <BR>in the <BR/>world today.</p><p>Millions of people watch TV. </p><p>That’s because a radio is very small <span_style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span_style=":_black;">haha100%</span></p>bb'
MyString = re.sub('(?i)<(?!br/?>)[^>]*>', r'', MyString)
print(MyString)

Prints,

aaaRadio and television.<br>very<br/> popular <BR>in the <BR/>world today.Millions of people watch TV. That’s because a radio is very small 98.2%and it‘s easy to carry. haha100%bb

Post a Comment for "How To Keep Characters With Regular Expressions That I Don't Want To Delete In Python?"