Filter Dataframe By Multiple Date Ranges
Given a dataframe with observations how can rows be returned which are within +-X days of a given list of dates? I came up with the following function, but is there a simpler more
Solution 1:
From your DataFrame
:
>>> import pandas as pd
>>> from io import StringIO
>>> df = pd.read_csv(StringIO("""
date,event
2012-01-01 12:30:00,event1
2012-01-01 12:30:12,event2
2012-01-01 12:30:12,event3
2012-01-02 12:28:29,event4
2012-02-01 12:30:29,event4
2012-02-01 12:30:38,event5
2012-03-01 12:31:05,event6
2012-03-01 12:31:38,event7
2012-06-01 12:31:44,event8
2012-07-01 10:31:48,event9
2012-07-01 11:32:23,event10"""))
>>> df['date'] = pd.to_datetime(df['date'], format="%Y-%m-%d %H:%M:%S.%f")
>>> df
date event
02012-01-01 12:30:00 event1
12012-01-01 12:30:12 event2
22012-01-01 12:30:12 event3
32012-01-02 12:28:29 event4
42012-02-01 12:30:29 event4
52012-02-01 12:30:38 event5
62012-03-01 12:31:05 event6
72012-03-01 12:31:38 event7
82012-06-01 12:31:44 event8
92012-07-01 10:31:48 event9
102012-07-01 11:32:23 event10
First, we start by shifting the date
column and substract it to the original date
column :
>>> g = df['date'].sub(df['date'].shift(1)).dt.days
>>> g
0 NaN
10.020.030.0430.050.0629.070.0892.0929.0100.0
Name: date, dtype: float64
Then, we apply a cumsum
for all values greater than X
(here it is 1 day) to get the expect result :
>>> X = 1>>> df.groupby(g.gt(X).cumsum()).apply(print)
date event
02012-01-01 12:30:00 event1
12012-01-01 12:30:12 event2
22012-01-01 12:30:12 event3
32012-01-02 12:28:29 event4
date event
42012-02-01 12:30:29 event4
52012-02-01 12:30:38 event5
date event
62012-03-01 12:31:05 event6
72012-03-01 12:31:38 event7
date event
82012-06-01 12:31:44 event8
date event
92012-07-01 10:31:48 event9
102012-07-01 11:32:23 event10
Post a Comment for "Filter Dataframe By Multiple Date Ranges"