Match C++ Strings And String Literals Using Regex In Python
I am trying to match Strings (both between double & single quotes) and String Literals in C++ source files. I am using the re library in Python. I have reached the point where
Solution 1:
You can grab all the string literals with the following regex:
r'(?P<prefix>(?:\bu8|\b[LuU])?)(?:"(?P<dbl>[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P<sngl>[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P<raw>.*?)\)\4"'See the regex demo
Explanation:
(?P<prefix>(?:\bu8|\b[LuU])?)- (Group named "prefix") the optional prefix, eitheru8(whole word) orL,u,U(as whole words)(?:"(?P<dbl>[^"\\]*(?:\\.[^"\\\\]*)*)"- a double quoted string literal, with the contents between"captured into Group named "dbl". The part is matching", then 0+ characters other than\and"followed with any number (0+) of sequences of an escape sequence (\\.) followed with 0+ characters other than\and"(it is an unrolled version of(?:[^"\\]|\\.)*)|- or\'(?P<sngl>[^\'\\]*(?:\\.[^\'\\]*)*)\')- a single quoted string literal, with the contents between'captured into Group named "sngl". See details on how it works above.|- orR"([^"(]*)\((?P<raw>.*?)\)\4"- this is a raw string literal part capturing the contents into a group namedraw. First,Ris matched. Then"followed with 0+ characters other than"and(while capturing the delimiter value into Group 4 (as all named groups also have their numeric IDs), and then the inside conetents are matched with a lazy construct (usere.Sif the strings are multiline), up to the first)followed with the contents of Group 4 (the raw string literal delimiter), and then the final".
Sample Python demo:
import re
p = re.compile(r'(?P<prefix>(?:\bu8|\b[LuU])?)(?:"(?P<dbl>[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P<sngl>[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P<raw>.*?)\)\4"')
s = "\"text'\\\"here\"\nL'text\\'\"here'\nu8\"text'\\\"here\"\nu'text\\'\"here'\nU\"text'\\\"here\"\nR\"delimiter(text\"'\"here)delimiter\""print(s)
print('---------Regex works below ---------')
for x in p.finditer(s):
if x.group("dbl"):
print(x.group("dbl"))
elif x.group("sngl"):
print(x.group("sngl"))
else:
print(x.group("raw"))
Post a Comment for "Match C++ Strings And String Literals Using Regex In Python"