Skip to content Skip to sidebar Skip to footer

Fill In Dates And Use Previous Values

my pandas dataframe looks like the below country date gd US 01-01-2014 2 US 01-01-2015 3 US 01-01-2013 0.4 UK

Solution 1:

In [67]:today=pd.to_datetime(pd.datetime.now()).normalize()In [68]:l=df.country.nunique()In [72]:df.append(pd.DataFrame({'country':df.country.unique(),'date':[today]*l,'gd':[np.nan]*l}))\...:.sort_values('date')\...:.groupby('country')\...:.resample('1D',on='date')\...:.mean()\...:.reset_index()\...:.ffill()...:Out[72]:countrydategd0UK2000-01-01  0.71UK2000-01-02  0.72UK2000-01-03  0.73UK2000-01-04  0.74UK2000-01-05  0.75UK2000-01-06  0.76UK2000-01-07  0.77UK2000-01-08  0.78UK2000-01-09  0.79UK2000-01-10  0.7............8059      US2017-07-09  3.08060      US2017-07-10  3.08061      US2017-07-11  3.08062      US2017-07-12  3.08063      US2017-07-13  3.08064      US2017-07-14  3.08065      US2017-07-15  3.08066      US2017-07-16  3.08067      US2017-07-17  3.08068      US2017-07-18  3.0

[8069 rowsx3columns]

Solution 2:

s=df.set_index(['country','date']).gdtoday=pd.datetime.today()defthen2now(x):x=x.xs(x.name)mn=x.index.min()returnx.reindex(pd.date_range(mn,today,name='date')).ffill()s.groupby(level='country').apply(then2now).reset_index()countrydategd0UK2000-01-01  0.7400UK2001-02-04  0.5800UK2002-03-11  0.51200      UK2003-04-15  0.51600      UK2004-05-19  0.52000      UK2005-06-23  0.52400      UK2006-07-28  0.52800      UK2007-09-01  0.53200      UK2008-10-05  0.53600      UK2009-11-09  0.54000      UK2010-12-14  0.54400      UK2012-01-18  0.54800      UK2013-02-21  0.55200      UK2014-03-28  0.55600      UK2015-05-02  0.56000      UK2016-06-05  1.06400      UK2017-07-10  1.06800      US2014-01-27  2.07200      US2015-03-03  3.07600      US2016-04-06  3.08000      US2017-05-11  3.0

Solution 3:

You could make date the index and then use reindex to expand the dates and ffill to forward-fill the NaNs:

def expand_dates(grp):
    start= grp.index.min()
    end= today
    index = pd.date_range(start, end, freq='D')
    return grp.reindex(index).ffill()

Use groupby/apply to call expand_dates once for each group and concatenate the results:

df = df.groupby('country')['gd'].apply(expand_dates)

Correction: My first answer forward-filled the entire DataFrame as the last step: df = df.ffill(). That is correct only if each country's first gd value is not NaN. If the starting row(s) for a certain country have NaNgd value(s), then forward-filling may contaminate those gd values with values from another country. Yikes. The more robust and correct method would be to forward-fill once for each group as shown by piRSquared. Any performance gain achieved by forward-filling once instead of many times on smaller DataFrames would be minor since the number of ffill calls is limited by the number of countries (a pretty low number) and safe-guarding against a potential bug is far more important than the limited performance gain that is possible.


import numpy as np
import pandas as pd
df = pd.DataFrame({'country': ['US', 'US', 'US', 'UK', 'UK', 'UK'], 'date': ['01-01-2014', '01-01-2015', '01-01-2013', '01-01-2000', '02-01-2001', '01-01-2016'], 'gd': [2.0, 3.0, 0.4, 0.7, 0.5, 1.0]})
df['date'] = pd.to_datetime(df['date'])
today = pd.Timestamp('today')
defexpand_dates(grp):
    start = grp.index.min()
    end = today
    index = pd.date_range(start, end, freq='D')
    return grp.reindex(index).ffill()
df = df.set_index('date')
df = df.groupby('country')['gd'].apply(expand_dates)
print(pd.concat([df.head(), df.tail()]))

yields

countryUK2000-01-01    0.72000-01-02    0.72000-01-03    0.72000-01-04    0.72000-01-05    0.7US2017-07-14    3.02017-07-15    3.02017-07-16    3.02017-07-17    3.02017-07-18    3.0Name:gd,dtype:float64

Post a Comment for "Fill In Dates And Use Previous Values"