Skip to content Skip to sidebar Skip to footer

Efficient Use Of Numpy To Process In Blocks Of Rows

I need to iterate over a set of unique accounts (AccountID in the example code below) and calculate a selection of features for each unique AccountID (currently just showing Target

Solution 1:

IIUC I think you need:

import pandas as pd
df = pd.DataFrame({'AccountID': [1,  1,       1, 2,   1,  2,  1,      2, 2],
                   'RefDay':    [1,  2,       3, 1,   4,  2,  5,      3, 4],
                   'BCol':      [1., 2., np.nan, 1., 3., 2., 1., np.nan, 2.] ,
                   'CCol':      [3., 2.,     3., 1., 3., 4., 5.,     2., 1.] })
df = df.sort_values(by=['AccountID','RefDay']).reset_index(drop=True)

# Replace with 6 in real data
periods = 3
result = df.groupby('AccountID').apply(lambda g: g['BCol'].fillna(0).rolling(periods).sum().shift(-periods + 1) / g['CCol'])
df['TargetColumn'] = result.sortlevel(1).values
print(df)

Output:

   AccountID  BCol  CCol  RefDay  TargetColumn
0          1   1.0   3.0       1      1.000000
1          1   2.0   2.0       2      2.500000
2          1   NaN   3.0       3      1.333333
3          1   3.0   3.0       4           NaN
4          1   1.0   5.0       5           NaN
5          2   1.0   1.0       1      3.000000
6          2   2.0   4.0       2      1.000000
7          2   NaN   2.0       3           NaN
8          2   2.0   1.0       4           NaN

Post a Comment for "Efficient Use Of Numpy To Process In Blocks Of Rows"