Skip to content Skip to sidebar Skip to footer

Regular Expression: Matching And Grouping A Variable Number Of Space Separated Words

I have a string: 'foo hello world baz 33' The part between foo and baz will be some number of space separated words (one or more). I want to match this string with an re that will

Solution 1:

re.match returns result at the start of the string. Use re.search instead.
.*? returns the shortest match between two words/expressions (. means anything, * means 0 or more occurrences and ? means shortest match).

import re
my_str = "foo hello world baz 33"
my_pattern = r'foo\s(.*?)\sbaz'
p = re.search(my_pattern,my_str,re.I)
result =  p.group(1).split()
print result

['hello', 'world']

EDIT:

In case foo or baz are missing, and you need to return the entire string, use an if-else:

if p is not None:
    result = p.group(1).split()
else:
    result = my_str  

Why the ? in the pattern:
Suppose there are multiple occurrences of the word baz:

my_str =  "foo hello world baz 33 there is another baz"  

using pattern = 'foo\s(.*)\sbaz' will match(longest and greedy) :

'hello world baz 33 there is another'

whereas , using pattern = 'foo\s(.*?)\sbaz' will return the shortest match:

'hello world'

Solution 2:

[This is not a solution, but I try to explain why is not possible]

What you're after is something like this:

foo\s(\w+\s)+baz\s(\d+)

The cool part would be (\w+\s)+ that would repeat the capturing group. The problem is that most regex flavors, are storing only the last match in that capturing group; old captures are overwritten.

I recommend to loop over the string with a simpler regex.

Hope it helps


Solution 3:

use index to find the foo and baz. then split the sub string

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end].split()
    except ValueError:
        return ""

s = "foo hello world baz 33"
start = "foo"
end = "baz"
print find_between(s,start,end)

Post a Comment for "Regular Expression: Matching And Grouping A Variable Number Of Space Separated Words"