Skip to content Skip to sidebar Skip to footer

How To Apply An Accumulative Custom Aggregation Function With A Group By On Pandas

I have the following DataFrame df = pd.DataFrame({'model': ['A0', 'A0', 'A1', 'A1','A0', 'A0', 'A1', 'A1', 'A0', 'A0', 'A1', 'A1'], 'y_true': [1, 2, 3, 3, 4, 5

Solution 1:

This should do it:

import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error, explained_variance_score

df = pd.DataFrame({
    'model': ['A0', 'A0', 'A1', 'A1','A0', 'A0', 'A1', 'A1', 'A0', 'A0', 'A1', 'A1'],
    'week': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
    'y_true': [1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11],
    'y_pred': [0, 1, 5, 5, 7, 8, 8, 12, 8, 7, 14, 15]
})

defmetrics(df):
    df['mae'] = mean_absolute_error(df.y_true, df.y_pred)
    df['mse'] = mean_squared_error(df.y_true, df.y_pred)
    df['evs'] = explained_variance_score(df.y_true, df.y_pred)
    return df


# groupby model, week and keep all values of y_true/y_pred as lists
df_group = df.groupby(['model', 'week']).agg(list)

# accumulate values for y_true and y_pred
df_group = df_group.groupby('model')['y_true', 'y_pred'].apply(lambda x: x.cumsum())

# apply metrics to new columns
df_group.apply(metrics, axis=1)

Solution 2:

Answer in addition to RubenB : a small modification of his code allows for what's asked.

This comes after:

df_group = df.groupby(['model', 'week']).agg(lambda x: list(x))

We can use cumsum on certain parts:

for col in ['y_true','y_pred']:
    df_group[f'{col}_cum'] = None
df_group = df_group.reset_index().set_index('model') #thisisfor convenience
for col in ['y_true','y_pred']:
    for model in df_group.index: #now we dothis once per model
        df_group.loc[model,f'{col}_cum'] = df_group.loc[model,col].cumsum()

And finally, as RubenB did:

df_group.apply(metrics, axis=1)

Attempt without the extra loop - this turns into a messy lambda function, though.

df_group = df.groupby(['model', 'week']).agg(lambda x: list(x))
df_group = df_group.reset_index()
for col in ['y_true','y_pred']:
    df_group[f'{col}_cum'] = df_group.apply(lambda x:
         df_group.loc[(df_group.model==x.model)&(df_group.week<=x.week),col].sum(),axis=1)

And finally:

df_group.set_index(['model','week']).apply(metrics, axis=1)

Post a Comment for "How To Apply An Accumulative Custom Aggregation Function With A Group By On Pandas"