Skip to content Skip to sidebar Skip to footer

How To Detect Change In Last 2 Months Starting From Specific Row In Pandas DataFrame

Let's say we have a dataframe like this: Id Policy_id Start_Date End_Date Fee1 Fee2 Last_dup 0 b123 2019/02/24 2019/03/23 0 23 0 1 b123

Solution 1:

I think adding a "transaction number column" for each policy will make this easier. Then you can just de-dupe the transactions to see if there are "changed" rows.

Look at the following for example:

import pandas as pd

dat = [['b123', 234, 522], ['b123', 234, 522], ['c123', 34, 23], 
['c123', 38, 23], ['c123', 34, 23]]

cols = ['Policy_id', 'Fee1', 'Fee2']

df = pd.DataFrame(dat, columns=cols)

df['transaction_id'] = 1
df['transaction_id'] = df.groupby('Policy_id').cumsum()['transaction_id']

df2 = df[cols].drop_duplicates()

final_df = df2.join(df[['transaction_id']])

The output is:

      Policy_id  Fee1  Fee2  transaction_id
0      b123   234   522               1
2      c123    34    23               1
3      c123    38    23               2

And since b123 only has one transaction after de-duping, you know that nothing changed. Something had to change with c123.

You can get all the changed transactions with final_df[final_df.transaction_id > 1].

As mentioned, you might have to do some other math with the dates, but this should get you most of the way there.

Edit: If you want to only look at the last two months, you can filter the DataFrame prior to running the above.

How to do this:

Make a variable for your filtered date like so:

from datetime import date, timedelta
filtered_date = date.today() - timedelta(days=60)

Then I would use the pyjanitor package to use its filter_date method. Just filter on whatever column is the column that you want; I thought that Start_date appears most reasonable.

import janitor

final_df.filter_date("Start_date", start=filtered_date)

Once you run import janitor, final_df will magically have the filter_date method available.

You can see more filter_date examples here.


Post a Comment for "How To Detect Change In Last 2 Months Starting From Specific Row In Pandas DataFrame"