Skip to content Skip to sidebar Skip to footer

Guessing Date Format For Many Identically-formatted Dates In Python

I have a large set of datetime strings and it can be safely assumed that they're all identically formatted. For example, I might have the set of dates '7/1/13 0:45', '5/2/13 6:21',

Solution 1:

Check out https://github.com/jeffreystarr/dateinfer

Seems a little abandoned but maybe it will go with your needs.

Solution 2:

Have you tried using dateutil.parser.parse on the tokenized time strings from the set?

It's often very robust to a wide range of formats, or from errors you get it becomes obvious how to slightly massage your data into a format that it works with.

In [11]: dateutil.parser.parse("7/1/13 0:45")
Out[11]: datetime.datetime(2013, 7, 1, 0, 45)

Do take care of ambiguities in the data. For example, it doesn't look like your time stamps use 24 hours, but instead would report "3:00 pm" and "3:00 am" identically on the same date. Unless you have some way of assigning am / pm to the data, no parser can help you out of that issue.

If your date strings are stored in an iterable then you can use map to apply the parse function to all of the strings:

In [12]: the_dates = ["7/1/13 0:45", "12/2/14 1:38", "4/30/13 12:12"]

In [13]: map(dateutil.parser.parse, the_dates)
Out[13]: 
[datetime.datetime(2013, 7, 1, 0, 45),
 datetime.datetime(2014, 12, 2, 1, 38),
 datetime.datetime(2013, 4, 30, 12, 12)]

And if you are in need of some of the extra arguments to dateutil.parser.parse that will indicate the formatting to use, you can use functools.partial to first bind those keyword arguments, and then use map as above to apply the partial function.

For example, suppose you wanted to be extra careful that DAY is treated as the first number. You could always call parse with the extra argument dayfirst=True, or you could pre-bind this argument and treat it like a new function that always had this property.

In [42]: import functools

In [43]: new_parse = functools.partial(dateutil.parser.parse, dayfirst=True)

In [44]: map(new_parse, the_dates)
Out[44]: 
[datetime.datetime(2013, 1, 7, 0, 45),
 datetime.datetime(2014, 2, 12, 1, 38),
 datetime.datetime(2013, 4, 30, 12, 12)]

In [45]: new_parse.keywords 
Out[45]: {'dayfirst': True}

In [46]: new_parse.func 
Out[46]: <function dateutil.parser.parse>

(Note that in this example, the third date cannot be parsed with day-first, since neither 30 nor 13 can be a month... so it falls back to the default format in that case).

Post a Comment for "Guessing Date Format For Many Identically-formatted Dates In Python"