Skip to content Skip to sidebar Skip to footer

Pandas - Insert Rows Where Data Is Missing

I have a dataset, here is an example: df = DataFrame({'Seconds_left':[5,10,15,25,30,35,5,10,15,30], 'Team':['ATL','ATL','ATL','ATL','ATL','ATL','SAS','SAS','SAS','SAS'], 'Fouls': [

Solution 1:

Create a MultiIndex and reindex + reset_index:

idx = pd.MultiIndex.from_product([df['Team'].unique(), 
                                  np.arange(5, df['Seconds_left'].max()+1, 5)],
                                 names=['Team', 'Seconds_left'])

df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out: 
   Team  Seconds_left  Fouls
0   ATL             51.01   ATL            102.02   ATL            153.03   ATL            20    NaN
4   ATL            253.05   ATL            304.06   ATL            355.07   SAS             55.08   SAS            104.09   SAS            151.010  SAS            20    NaN
11  SAS            25    NaN
12  SAS            301.013  SAS            35    NaN

Solution 2:

An approach using groupby and merge:

df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})

df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))

df_out['Team'] = df_out['Team'].fillna(method='ffill')

df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])

print(df_out)

Output:

    Fouls  Seconds_left Team
01.05  ATL
12.010  ATL
23.015  ATL
6NaN20  ATL
33.025  ATL
44.030  ATL
55.035  ATL
75.05  SAS
84.010  SAS
91.015  SAS
11NaN20  SAS
12NaN25  SAS
101.030  SAS
13NaN35  SAS

Solution 3:

import pandas as pd
import numpy as np


df = pd.DataFrame(columns = ['a', 'b'])

df.loc[len(df)] = [1,np.NaN]

Post a Comment for "Pandas - Insert Rows Where Data Is Missing"