Match C++ Strings And String Literals Using Regex In Python
I am trying to match Strings (both between double & single quotes) and String Literals in C++ source files. I am using the re library in Python. I have reached the point where
Solution 1:
You can grab all the string literals with the following regex:
r'(?P<prefix>(?:\bu8|\b[LuU])?)(?:"(?P<dbl>[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P<sngl>[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P<raw>.*?)\)\4"'
See the regex demo
Explanation:
(?P<prefix>(?:\bu8|\b[LuU])?)
- (Group named "prefix") the optional prefix, eitheru8
(whole word) orL
,u
,U
(as whole words)(?:"(?P<dbl>[^"\\]*(?:\\.[^"\\\\]*)*)"
- a double quoted string literal, with the contents between"
captured into Group named "dbl". The part is matching"
, then 0+ characters other than\
and"
followed with any number (0+) of sequences of an escape sequence (\\.
) followed with 0+ characters other than\
and"
(it is an unrolled version of(?:[^"\\]|\\.)*
)|
- or\'(?P<sngl>[^\'\\]*(?:\\.[^\'\\]*)*)\')
- a single quoted string literal, with the contents between'
captured into Group named "sngl". See details on how it works above.|
- orR"([^"(]*)\((?P<raw>.*?)\)\4"
- this is a raw string literal part capturing the contents into a group namedraw
. First,R
is matched. Then"
followed with 0+ characters other than"
and(
while capturing the delimiter value into Group 4 (as all named groups also have their numeric IDs), and then the inside conetents are matched with a lazy construct (usere.S
if the strings are multiline), up to the first)
followed with the contents of Group 4 (the raw string literal delimiter), and then the final"
.
Sample Python demo:
import re
p = re.compile(r'(?P<prefix>(?:\bu8|\b[LuU])?)(?:"(?P<dbl>[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P<sngl>[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P<raw>.*?)\)\4"')
s = "\"text'\\\"here\"\nL'text\\'\"here'\nu8\"text'\\\"here\"\nu'text\\'\"here'\nU\"text'\\\"here\"\nR\"delimiter(text\"'\"here)delimiter\""print(s)
print('---------Regex works below ---------')
for x in p.finditer(s):
if x.group("dbl"):
print(x.group("dbl"))
elif x.group("sngl"):
print(x.group("sngl"))
else:
print(x.group("raw"))
Post a Comment for "Match C++ Strings And String Literals Using Regex In Python"