Pandas Advanced Groupby And Filter By Date
Create the output dataframe from input, how to filter for rows when target == 1 for the first time for each id, or in order words removing consecutive occurrence for each ids where
Solution 1:
You could keep only the rows in the groupby where the cumsum of target is <= 1, then group again and make sure that a zero after a one is dropped using .ne
import pandas as pd
df = pd.DataFrame({'ID': ['a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a2', 'a2'],
'date': ['2019-11-01',
'2019-12-01',
'2020-01-01',
'2020-02-01',
'2020-03-01',
'2019-11-01',
'2019-12-01',
'2020-03-01',
'2020-04-01'],
'target': [0, 0, 1, 1, 0, 0, 1, 0, 1]})
df = df.loc[df.groupby('ID')['target'].cumsum()<=1]
df = df.loc[df.groupby('ID')['target'].shift(1).ne(1)]
Output
ID date target
0 a1 2019-11-01 0
1 a1 2019-12-01 0
2 a1 2020-01-01 1
5 a2 2019-11-01 0
6 a2 2019-12-01 1
Solution 2:
from io import stringIO
data = StringIO("""
uid, date, target
a1, 2019-11-01, 0
a1, 2019-12-01, 0
a1, 2020-01-01, 1
a1, 2020-02-01, 1
a1, 2020-03-01, 0
a2, 2019-11-01, 0
a2, 2019-12-01, 1
a2, 2020-03-01, 0
a2, 2020-04-01, 1
"""
)
df = pd.read_csv(data).rename(columns=lambda x: x.strip())
def filter_in_group(df: pd.DataFrame):
ind = np.argmax(df.target)
return df.loc[:, ['date', 'target']].iloc[:ind+1]
df_filtered = (
df
.groupby('uid')
.apply(lambda x: filter_in_group(x))
.reset_index()
.drop('level_1', axis=1)
)
Post a Comment for "Pandas Advanced Groupby And Filter By Date"