Apply A Function On A Column Of A Dataframe Depending On The Value Of Another Column And Then Groupby
Solution 1:
You can use np.where to multiply the values in column 'A' by 2 if the values in column 'B' is either 0 or 2.
example['A'] = np.where(example['condition'].isin([0,2]), example['A']*2,example['A'])
To perform summation on A if condition columns satisfy the criteria, you can first include a new column in your dataframe example which states whether A is > or < than 2.5 then perform aggregation over this dataframe.
example['check_A'] =np.where(example['A']>2.5,1,0)
new = example.groupby(['condition','check_A'])['A'].apply(lambda c: c.abs().sum())
Solution 2:
First we get all the rows where condition is 0 or 2
. Then we multiply
the A
values by two of these rows and use GroupBy.sum
while using query
to filter all the rows where A >= 2.5
m = example['condition'].isin([0,2])
example['A'] = np.where(m, example['A'].mul(2), example['A'])
grpd = example.query('A.ge(2.5)').groupby('condition', as_index=False)['A'].sum()
Output
condition A002811182276
Details GroupBy.sum
:
First we use query
to get all the rows where A >= 2.5
:
example.query('A.ge(2.5)')
Acondition240331482510066171428162991
Then we use groupby on condition to get each group of unique values, in this case all rows with 0
, 1
and 2
:
for _, d in grpd.groupby('condition', as_index=False):
print(d, '\n')
A condition2805200
A condition331661991
A condition416272828322
So if we have the seperate groups, we can use .sum
method to sum the whole A
column:
for _, d in grpd.groupby('condition', as_index=False):
print(d['A'].sum(), '\n')
281876
Solution 3:
You were quite close in your original attempt. In particular, I would bring the condition out into its own separate function to enhance readability, and then apply the function to the data frame with axis=1
:
def f(row):
if row["condition"] ==0orrow["condition"] ==2:
return(int(row["A"] *2))
return(row["A"]) # Base condition
example['B'] = example.apply(f, axis=1) # Apply torowsof'example' df
example.drop("condition", axis=1, inplace=True)
example
A condition B
0000111122043313442855010661677214882169919
Then, to apply your groupby
operation:
example[example["A"] > 2.5].groupby("condition")["A"].apply(lambda x: np.sum(np.abs(x)))
condition05118219Name: A, dtype: int64
Solution 4:
try this,
df.loc[df['condition']%2==0, 'A'] = df['A']*2
O/P:
A condition
000111240331482510066171428162991
Post a Comment for "Apply A Function On A Column Of A Dataframe Depending On The Value Of Another Column And Then Groupby"