How To Keep Characters With Regular Expressions That I Don't Want To Delete In Python?
Radio and television.
very
popular in the world today.&
Solution 1:
You can capture <br>
tags separately in group1 and capture any other tag separately and replace the whole match with \1
to retain <br>
tags and remove rest other tags. Replace
(?i)(<br\/?>)|<[^>]*>
with \1
. Also added (?i)
inline modifier (you can also pass re.IGNORECASE
as fourth argument in re.sub
to make it case-insensitive) to make the regex case insensitive for also matching it with <BR>
or <BR/>
Your updated Python code,
import re
MyString = 'aaa<p>Radio and television.<br></p><p>very<br/> popular <BR>in the <BR/>world today.</p><p>Millions of people watch TV. </p><p>That’s because a radio is very small <span_style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span_style=":_black;">haha100%</span></p>bb'
MyString = re.sub('(?i)(<br/?>)|<[^>]*>', r'\1', MyString)
print(MyString)
Prints the string with br
tag only and rest tags removed,
aaaRadio and television.<br>very<br/> popular <BR>in the <BR/>world today.Millions of people watch TV. That’s because a radio is very small 98.2%and it‘s easy to carry. haha100%bb
In another approach, you can also use a negative look ahead to reject tags that are br
using this regex,
(?i)<(?!br/?>)[^>]*>
and just replace it with empty string.
Regex Demo using negative lookahead to reject
Python code using negative lookahead regex,
import re
MyString = 'aaa<p>Radio and television.<br></p><p>very<br/> popular <BR>in the <BR/>world today.</p><p>Millions of people watch TV. </p><p>That’s because a radio is very small <span_style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span_style=":_black;">haha100%</span></p>bb'
MyString = re.sub('(?i)<(?!br/?>)[^>]*>', r'', MyString)
print(MyString)
Prints,
aaaRadio and television.<br>very<br/> popular <BR>in the <BR/>world today.Millions of people watch TV. That’s because a radio is very small 98.2%and it‘s easy to carry. haha100%bb
Post a Comment for "How To Keep Characters With Regular Expressions That I Don't Want To Delete In Python?"