Pandas - Insert Rows Where Data Is Missing
I have a dataset, here is an example: df = DataFrame({'Seconds_left':[5,10,15,25,30,35,5,10,15,30], 'Team':['ATL','ATL','ATL','ATL','ATL','ATL','SAS','SAS','SAS','SAS'], 'Fouls': [
Solution 1:
Create a MultiIndex and reindex + reset_index:
idx = pd.MultiIndex.from_product([df['Team'].unique(),
np.arange(5, df['Seconds_left'].max()+1, 5)],
names=['Team', 'Seconds_left'])
df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out:
Team Seconds_left Fouls
0 ATL 51.01 ATL 102.02 ATL 153.03 ATL 20 NaN
4 ATL 253.05 ATL 304.06 ATL 355.07 SAS 55.08 SAS 104.09 SAS 151.010 SAS 20 NaN
11 SAS 25 NaN
12 SAS 301.013 SAS 35 NaN
Solution 2:
An approach using groupby
and merge
:
df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})
df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))
df_out['Team'] = df_out['Team'].fillna(method='ffill')
df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])
print(df_out)
Output:
Fouls Seconds_left Team
01.05 ATL
12.010 ATL
23.015 ATL
6NaN20 ATL
33.025 ATL
44.030 ATL
55.035 ATL
75.05 SAS
84.010 SAS
91.015 SAS
11NaN20 SAS
12NaN25 SAS
101.030 SAS
13NaN35 SAS
Solution 3:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['a', 'b'])
df.loc[len(df)] = [1,np.NaN]
Post a Comment for "Pandas - Insert Rows Where Data Is Missing"