Dataframe Groupby - Return Delta Time For Log Entries
I've got some log data that I'd like to first group by user_id, then, pick out the, say, 2nd entry. That's done below. The missing step is the age of each entry relative to the fir
Solution 1:
First convert the date from a string column to datetime64[ns] dtype
In [21]: dd['date'] = pd.to_datetime(dd['date'])
In [22]: dd
Out[22]:
date item_id user_id
0 2013-12-29 17:56:01 0 6
1 2013-12-29 19:44:09 4 8
2 2013-12-29 19:58:05 6 3
3 2013-12-29 20:00:09 8 3
4 2013-12-29 20:13:35 9 6
5 2013-12-29 20:19:56 1 6
[6 rows x 3 columns]
sort by the date
In [23]:dd.sort_index(by='date')Out[23]:dateitem_iduser_id02013-12-29 17:56:01 0612013-12-29 19:44:09 4822013-12-29 19:58:05 6332013-12-29 20:00:09 8342013-12-29 20:13:35 9652013-12-29 20:19:56 16
[6rowsx3columns]
define a function to diff on that column (and just return the rest of the group)
In [4]: def f(x):
...: x['diff'] = x['date']-x['date'].iloc[0]
...: return x
...:
In [5]: dd.sort_index(by='date').groupby('user_id').apply(f)
Out[5]:
date item_id user_id diff
02013-12-2917:56:010600:00:0012013-12-2919:44:094800:00:0022013-12-2919:58:056300:00:0032013-12-2920:00:098300:02:0442013-12-2920:13:359602:17:3452013-12-2920:19:561602:23:55
[6 rows x 4 columns]
the diff is now a timedelta64[ns], see here for how to convert/round to a specific frequency (e.g. days).
This is with pandas 0.13 (releasing in next day or 2). Most of this will work in 0.12 as well.
Post a Comment for "Dataframe Groupby - Return Delta Time For Log Entries"