Regex: Why Do Empty Strings Get Included (in A List Of Tuples) In Re.findall()?
Solution 1:
Whenever you are using a capturing group, it always returns a submatch, even if it is empty/null. You have 3 capturing groups, so you will always have them in the findall
result.
In regex101.com, you can see these non-participating groups by turning them on in Options:
You may tighten up your regex by removing capturing groups:
(?:[a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Or even (?:[a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4}|\d{1,3}(?:\.\d{1,3}){3}
.
See a regex demo
And since the regex pattern does not contain capturing groups, re.findall
will only return matches, not capturing group contents:
import re
p = re.compile(r'(?:[a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
test_str = "from mail.example.com (example.com. [213.239.250.131]) by\n mx.google.com with ESMTPS id xc4si15480310lbb.82.2014.10.26.06.16.58 for\n <alex@example.com> (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256\n bits=128/128); Sun, 26 Oct 2014 06:16:58 -0700 (PDT)"print(re.findall(p, test_str))
Output of the online Python demo:
['213.239.250.131', '014.10.26.06']
Solution 2:
these are the capturing groups. if you do or queries it will return empty matches for the non matching expressions.
(([a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4})|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
the first or has 2 groups:
(([a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4})
and after the or there is the third:
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
to say it in a simple way each round bracket defines a capturing group which will show up if the value matches or not.
Post a Comment for "Regex: Why Do Empty Strings Get Included (in A List Of Tuples) In Re.findall()?"