Extracting Year From String In Python
Solution 1:
There are all sorts of ways to do it, here are several options:
dateutil
parser in a "fuzzy" mode:In [1]: s ='years since 1250-01-01 0:0:0'In [2]: from dateutil.parser import parse In [3]: parse(s, fuzzy=True).year # resulting year would be an integerOut[3]: 1250
regular expressions with a capturing group:
In [2]: import re In [3]: re.search(r"years since (\d{4})", s).group(1) Out[3]: '1250'
splitting by "since" and then by a dash:
In [2]: s.split("since", 1)[1].split("-", 1)[0].strip() Out[2]: '1250'
or may be even splitting by the first dash and slicing the first substring:
In [2]: s.split("-", 1)[0][-4:] Out[2]: '1250'
The last two involve more "moving parts" and might not be applicable depending on possible variations of the input string.
Solution 2:
You can use a regex with a capture group around the four digits, while also making sure you have a particular pattern around it. I would probably look for something that:
4 digits and a capture
(\d{4})
hyphen
-
two digits
\d{2}
hyphen
-
two digits
\d{2}
Giving: (\d{4})-\d{2}-\d{2}
Demo:
>>>import re>>>d = re.findall('(\d{4})-\d{2}-\d{2}', 'years since 1250-01-01 0:0:0')>>>d
['1250']
>>>d[0]
'1250'
if you need it as an int, just cast it as such:
>>>int(d[0])
1250
Solution 3:
The following regex should make the four digit year available as the first capture group:
^.*\(d{4})-\d{2}-\d{2}.*$
Post a Comment for "Extracting Year From String In Python"