Python Regex - Find Numbers With Comma In String
Solution 1:
You can use
>>> re.sub(r"(\d),(\d)", r"\1.\2", "Foo Bar, FooTown, $100,00")
'Foo Bar, FooTown, $100.00'
Solution 2:
You can also use negative lookaheads... those big forgottens in the super-powerful Python regular expression mechanisms...
You can make a regular expression to split by commas who are not preceded by a digit or followed by a digit.
#!/usr/bin/env python
import re
samples=[
"Foo Bar, FooTown, $100,00",
"$100,00, Foo Bar, FooTown",
"Foo Bar, $100,00, FooTown",
"$100,00, Foo Bar, FooTown,",
]
myRegex=re.compile(",(?!\d)|(?<!\d),")
for sample in samples:
print"%s sample splitted: %s (%s items)" % (sample, myRegex.split(sample), len(myRegex.split(sample)))
Outputs:
Foo Bar, FooTown, $100,00 sample splitted: ['Foo Bar', ' FooTown', ' $100,00'] (3 items)
$100,00, Foo Bar, FooTown sample splitted: ['$100,00', ' Foo Bar', ' FooTown'] (3 items)
Foo Bar, $100,00, FooTown sample splitted: ['Foo Bar', ' $100,00', ' FooTown'] (3 items)
$100,00, Foo Bar, FooTown, sample splitted: ['$100,00', ' Foo Bar', ' FooTown', ''] (4 items)
I feel very sorry for the guys who developed the re module in Python... I've seen these kind of lookaheads very scarcely used.
Solution 3:
A RegEx replace of the pattern (\d),(\d)
with \1.\2
will work. The \d
matches any digit, and the parentheses around it means that the number will be remembered and \1
will match the first one and \2
will match the second one.
Solution 4:
Rather than fixing your data, why not fix your split?
>>> import re
>>> s = "Foo Bar, FooTown, $100,00">>> re.split(r'(?<!\d),|,(?!\d)', s)
['Foo Bar', ' FooTown', ' $100,00']
This uses negative lookahead and lookbehind assertions to make sure that the comma is not surrounded by digits.
edit: Changed the regular expression from r'(?<!\d),(?!\d)'
to r'(?<!\d),|,(?!\d)'
to properly handle strings like "$100,00, Foo Bar, FooTown". Thanks to BorrajaX for pointing out my error in the comments.
Post a Comment for "Python Regex - Find Numbers With Comma In String"