Skip to content Skip to sidebar Skip to footer

Efficient Use Of Numpy To Process In Blocks Of Rows

I need to iterate over a set of unique accounts (AccountID in the example code below) and calculate a selection of features for each unique AccountID (currently just showing Target

Solution 1:

IIUC I think you need:

import pandas as pd
df = pd.DataFrame({'AccountID': [1,  1,       1, 2,   1,  2,  1,      2, 2],
                   'RefDay':    [1,  2,       3, 1,   4,  2,  5,      3, 4],
                   'BCol':      [1., 2., np.nan, 1., 3., 2., 1., np.nan, 2.] ,
                   'CCol':      [3., 2.,     3., 1., 3., 4., 5.,     2., 1.] })
df = df.sort_values(by=['AccountID','RefDay']).reset_index(drop=True)

# Replace with 6 in real data
periods = 3
result = df.groupby('AccountID').apply(lambda g: g['BCol'].fillna(0).rolling(periods).sum().shift(-periods + 1) / g['CCol'])
df['TargetColumn'] = result.sortlevel(1).values
print(df)

Output:

   AccountID  BCol  CCol  RefDay  TargetColumn
011.03.011.000000112.02.022.50000021NaN3.031.333333313.03.04NaN411.05.05NaN521.01.013.000000622.04.021.00000072NaN2.03NaN822.01.04NaN

Post a Comment for "Efficient Use Of Numpy To Process In Blocks Of Rows"