Skip to content Skip to sidebar Skip to footer

Pandas: Drop Duplicates In Col[a] Keeping Row Based On Condition On Col[b]

Given the dataframe: df = pd.DataFrame({'col1': ['A', 'A', 'A','B','B'], 'col2': ['type1', 'type2', 'type1', 'type2', 'type1'] , 'hour': ['18:03:30','18:00:48', '18:13:46', '18:11:

Solution 1:

df.drop_duplicates(['col1','col2'] , keep = 'last')

Solution 2:

Following anky_91's comment I solved it like this:

df.sort_values('hour').drop_duplicates(['col1','col2'] , keep = 'last')

This sorts based on the column 'hour' so that you are sure that keep='last' gets the last element

Post a Comment for "Pandas: Drop Duplicates In Col[a] Keeping Row Based On Condition On Col[b]"