Efficient Use Of Numpy To Process In Blocks Of Rows
I need to iterate over a set of unique accounts (AccountID in the example code below) and calculate a selection of features for each unique AccountID (currently just showing Target
Solution 1:
IIUC I think you need:
import pandas as pd
df = pd.DataFrame({'AccountID': [1, 1, 1, 2, 1, 2, 1, 2, 2],
'RefDay': [1, 2, 3, 1, 4, 2, 5, 3, 4],
'BCol': [1., 2., np.nan, 1., 3., 2., 1., np.nan, 2.] ,
'CCol': [3., 2., 3., 1., 3., 4., 5., 2., 1.] })
df = df.sort_values(by=['AccountID','RefDay']).reset_index(drop=True)
# Replace with 6 in real data
periods = 3
result = df.groupby('AccountID').apply(lambda g: g['BCol'].fillna(0).rolling(periods).sum().shift(-periods + 1) / g['CCol'])
df['TargetColumn'] = result.sortlevel(1).values
print(df)
Output:
AccountID BCol CCol RefDay TargetColumn
0 1 1.0 3.0 1 1.000000
1 1 2.0 2.0 2 2.500000
2 1 NaN 3.0 3 1.333333
3 1 3.0 3.0 4 NaN
4 1 1.0 5.0 5 NaN
5 2 1.0 1.0 1 3.000000
6 2 2.0 4.0 2 1.000000
7 2 NaN 2.0 3 NaN
8 2 2.0 1.0 4 NaN
Post a Comment for "Efficient Use Of Numpy To Process In Blocks Of Rows"