Skip to content Skip to sidebar Skip to footer

Dataframe Groupby - Return Delta Time For Log Entries

I've got some log data that I'd like to first group by user_id, then, pick out the, say, 2nd entry. That's done below. The missing step is the age of each entry relative to the fir

Solution 1:

First convert the date from a string column to datetime64[ns] dtype

In [21]: dd['date'] = pd.to_datetime(dd['date'])

In [22]: dd
Out[22]: 
                 date  item_id  user_id
0 2013-12-29 17:56:01        0        6
1 2013-12-29 19:44:09        4        8
2 2013-12-29 19:58:05        6        3
3 2013-12-29 20:00:09        8        3
4 2013-12-29 20:13:35        9        6
5 2013-12-29 20:19:56        1        6

[6 rows x 3 columns]

sort by the date

In [23]:dd.sort_index(by='date')Out[23]:dateitem_iduser_id02013-12-29 17:56:01        0612013-12-29 19:44:09        4822013-12-29 19:58:05        6332013-12-29 20:00:09        8342013-12-29 20:13:35        9652013-12-29 20:19:56        16

[6rowsx3columns]

define a function to diff on that column (and just return the rest of the group)

In [4]: def f(x):
   ...:     x['diff'] = x['date']-x['date'].iloc[0]
   ...:     return x
   ...: 

In [5]: dd.sort_index(by='date').groupby('user_id').apply(f)
Out[5]: 
                 date  item_id  user_id     diff
02013-12-2917:56:010600:00:0012013-12-2919:44:094800:00:0022013-12-2919:58:056300:00:0032013-12-2920:00:098300:02:0442013-12-2920:13:359602:17:3452013-12-2920:19:561602:23:55

[6 rows x 4 columns]

the diff is now a timedelta64[ns], see here for how to convert/round to a specific frequency (e.g. days).

This is with pandas 0.13 (releasing in next day or 2). Most of this will work in 0.12 as well.

Post a Comment for "Dataframe Groupby - Return Delta Time For Log Entries"