Regular Expression: Matching And Grouping A Variable Number Of Space Separated Words
Solution 1:
re.match returns result at the start of the string. Use re.search instead.
.*? returns the shortest match between two words/expressions (. means anything, * means 0 or more occurrences and ? means shortest match).
import re
my_str = "foo hello world baz 33"
my_pattern = r'foo\s(.*?)\sbaz'
p = re.search(my_pattern,my_str,re.I)
result =  p.group(1).split()
print result
['hello', 'world']
EDIT:
In case foo or baz are missing, and you need to return the entire string, use an if-else:
if p is not None:
    result = p.group(1).split()
else:
    result = my_str  
Why the ? in the pattern:
Suppose there are multiple occurrences of the word baz:
my_str =  "foo hello world baz 33 there is another baz"  
using pattern = 'foo\s(.*)\sbaz' will match(longest and greedy) :
'hello world baz 33 there is another'
whereas , using pattern = 'foo\s(.*?)\sbaz' will return the shortest match:
'hello world'
Solution 2:
[This is not a solution, but I try to explain why is not possible]
What you're after is something like this:
foo\s(\w+\s)+baz\s(\d+)
The cool part would be (\w+\s)+ that would repeat the capturing group.
The problem is that most regex flavors, are storing only the last match in that capturing group; old captures are overwritten.
I recommend to loop over the string with a simpler regex.
Hope it helps
Solution 3:
use index to find the foo and baz. then split the sub string
def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end].split()
    except ValueError:
        return ""
s = "foo hello world baz 33"
start = "foo"
end = "baz"
print find_between(s,start,end)
Post a Comment for "Regular Expression: Matching And Grouping A Variable Number Of Space Separated Words"