Skip to content Skip to sidebar Skip to footer

Apply A Function On A Column Of A Dataframe Depending On The Value Of Another Column And Then Groupby

Assume the dataframe with column 'A' and column 'condition' as reproduced by the code below. example = pd.DataFrame({'A': range(10), 'condition': [0,1,0,1,2,0,1,2,2,1]}) I want to

Solution 1:

You can use np.where to multiply the values in column 'A' by 2 if the values in column 'B' is either 0 or 2.

example['A'] = np.where(example['condition'].isin([0,2]), example['A']*2,example['A'])

To perform summation on A if condition columns satisfy the criteria, you can first include a new column in your dataframe example which states whether A is > or < than 2.5 then perform aggregation over this dataframe.

example['check_A'] =np.where(example['A']>2.5,1,0)
new = example.groupby(['condition','check_A'])['A'].apply(lambda c: c.abs().sum())

Solution 2:

First we get all the rows where condition is 0 or 2. Then we multiply the A values by two of these rows and use GroupBy.sum while using query to filter all the rows where A >= 2.5

m = example['condition'].isin([0,2])
example['A'] = np.where(m, example['A'].mul(2), example['A'])
grpd = example.query('A.ge(2.5)').groupby('condition', as_index=False)['A'].sum()

Output

   condition   A002811182276

Details GroupBy.sum:

First we use query to get all the rows where A >= 2.5:

example.query('A.ge(2.5)')

    Acondition240331482510066171428162991

Then we use groupby on condition to get each group of unique values, in this case all rows with 0, 1 and 2:

for _, d in grpd.groupby('condition', as_index=False):
    print(d, '\n')

    A  condition2805200 

   A  condition331661991 

    A  condition416272828322

So if we have the seperate groups, we can use .sum method to sum the whole A column:

for _, d in grpd.groupby('condition', as_index=False):
    print(d['A'].sum(), '\n')

281876

Solution 3:

You were quite close in your original attempt. In particular, I would bring the condition out into its own separate function to enhance readability, and then apply the function to the data frame with axis=1:

def f(row):
    if row["condition"] ==0orrow["condition"] ==2:
        return(int(row["A"] *2))
    return(row["A"])   # Base condition 

example['B'] = example.apply(f, axis=1)   # Apply torowsof'example' df

example.drop("condition", axis=1, inplace=True)

example

    A   condition   B
0000111122043313442855010661677214882169919

Then, to apply your groupby operation:

example[example["A"] > 2.5].groupby("condition")["A"].apply(lambda x: np.sum(np.abs(x)))

condition05118219Name: A, dtype: int64

Solution 4:

try this,

df.loc[df['condition']%2==0, 'A'] = df['A']*2

O/P:

A  condition
000111240331482510066171428162991

Post a Comment for "Apply A Function On A Column Of A Dataframe Depending On The Value Of Another Column And Then Groupby"