Skip to content Skip to sidebar Skip to footer

Python Re.split And Attaching Matched Group To Either Right Or Left Side Of The Split

From this example: >>> re.split('(\W)', 'foo/bar spam\neggs') ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs'] is there a straightforward way to associate the capture group

Solution 1:

No, it is not possible. I'm not aware of any regex engine that supports this sort of thing. Splitting means splitting: you can keep the splitter or you can discard it, but you can't lump it with the pieces between the splits, because the separator is distinct from the things it separates.

With the regex module you can do it fairly simply, but it does require changing the original regex:

>>> regex.split('(?=\W)', 'foo/bar spam\neggs', flags=regex.V1)
['foo', '/bar', ' spam', '\neggs']

Unlike the builtin re module, the regex module allows splitting on zero-width matches, so you can use a lookahead to split at positions where the next character matches \W.

In the example you added in your edit, you can do it with lookahead even with plain re , because the splitter is not zero-width:

>>> map(lambda x: re.split(",(?=\S)", x), csv_data)
[['Some good data', 'Id 5'],
 ['Some bad data, like, really bad, dude', 'Id 6']]

Solution 2:

Is that the case you could use negative lookahead based regex like below.

>>> csv_data = [
    'Some good data,Id 5',
    'Some bad data, like, really bad, dude,Id 6'
]
>>> [re.split(r',(?!\s)', i) for i in csv_data]
[['Some good data', 'Id 5'], ['Some bad data, like, really bad, dude', 'Id 6']]

,(?!\s) matches all the commas which wouldn't be followed by a space character. Splitting according to the matched comma will give you the desired output.

Post a Comment for "Python Re.split And Attaching Matched Group To Either Right Or Left Side Of The Split"