Working With Mixed Datetime Formats In Pandas
I read a file into a pandas dataframe with dates that vary in their format: either the American: YYYY-MM-DD or the European: DD.MM.YYYY They come as a string. I would like to for
Solution 1:
Use to_datetime
with both formats separately, so get missing values if format not match, so for new column use Series.fillna
:
df = pd.DataFrame({'date': ['2000-01-12', '2015-01-23', '20.12.2015', '31.12.2009']})
print (df)
date
0 2000-01-12
1 2015-01-23
2 20.12.2015
3 31.12.2009
date1 = pd.to_datetime(df['date'], errors='coerce', format='%Y-%m-%d')
date2 = pd.to_datetime(df['date'], errors='coerce', format='%d.%m.%Y')
df['date'] = date1.fillna(date2)
print (df)
date
0 2000-01-12
1 2015-01-23
2 2015-12-20
3 2009-12-31
and ideally have them in the second format
Format of datetimes in python/pandas is by default YYYY-MM-DD
, if need custom one it is possible, but values are converted to strings, so datetimelike functions failed:
df['date'] = df['date'].dt.strftime('%d.%m.%Y')
print (df)
date
0 12.01.2000
1 23.01.2015
2 20.12.2015
3 31.12.2009
print (type(df.loc[0, 'date']))
<class 'str'>
Solution 2:
Just check which of the two format you have and apply pandas.to_datetime
with that format.
df = pd.DataFrame({'date': ['2000-01-12', '2015-01-23', '20.12.2015',
'31.12.2009']})
print(df)
date02000-01-1212015-01-23220.12.2015331.12.2009
def date_formator(date):
if'-'indate:
return pd.to_datetime(date, format = '%Y-%m-%d')
else:
return pd.to_datetime(date, format = '%d.%m.%Y')
df.date.apply(date_formator)
02000-01-1212015-01-2322015-12-2032009-12-31
Name: date, dtype: datetime64[ns]
Post a Comment for "Working With Mixed Datetime Formats In Pandas"