Skip to content Skip to sidebar Skip to footer

Summing Rows Based On Keyword Within Index

I am trying to sum multiple rows together based on a keyword that is part of the index - but it is not the entire index. For example, the index could look like

Solution 1:

May be this:

df.groupby(df.index.to_series()
           .str.split('_', expand=True)[1]
          )['Count'].sum()

Output:

1
Apple      45
Banana    100
Name: Count, dtype: int64

Solution 2:

Given the following dataframe:

raw_data = {'id':    ['1234_Banana_Green', '4321_Banana_Yellow', 
                               '2244_Banana_Brown', '12345_Apple_Red', 
                               '1267_Apple_Blue']}

df = pd.DataFrame(raw_data).set_index(['id'])

Try this code:

df = df.reset_index()
df['extracted_keyword'] = df['id'].apply(lambda x: x.split('_')[1])
df.groupby(["extracted_keyword"]).count()

And gives:

                   id
extracted_keyword    
Apple               2
Banana              3

if you want restore the index, add in the end:

df = df.set_index(['id'])

Post a Comment for "Summing Rows Based On Keyword Within Index"