Fill In Dates And Use Previous Values
Solution 1:
In [67]:today=pd.to_datetime(pd.datetime.now()).normalize()In [68]:l=df.country.nunique()In [72]:df.append(pd.DataFrame({'country':df.country.unique(),'date':[today]*l,'gd':[np.nan]*l}))\...:.sort_values('date')\...:.groupby('country')\...:.resample('1D',on='date')\...:.mean()\...:.reset_index()\...:.ffill()...:Out[72]:countrydategd0UK2000-01-01 0.71UK2000-01-02 0.72UK2000-01-03 0.73UK2000-01-04 0.74UK2000-01-05 0.75UK2000-01-06 0.76UK2000-01-07 0.77UK2000-01-08 0.78UK2000-01-09 0.79UK2000-01-10 0.7............8059 US2017-07-09 3.08060 US2017-07-10 3.08061 US2017-07-11 3.08062 US2017-07-12 3.08063 US2017-07-13 3.08064 US2017-07-14 3.08065 US2017-07-15 3.08066 US2017-07-16 3.08067 US2017-07-17 3.08068 US2017-07-18 3.0
[8069 rowsx3columns]
Solution 2:
s=df.set_index(['country','date']).gdtoday=pd.datetime.today()defthen2now(x):x=x.xs(x.name)mn=x.index.min()returnx.reindex(pd.date_range(mn,today,name='date')).ffill()s.groupby(level='country').apply(then2now).reset_index()countrydategd0UK2000-01-01 0.7400UK2001-02-04 0.5800UK2002-03-11 0.51200 UK2003-04-15 0.51600 UK2004-05-19 0.52000 UK2005-06-23 0.52400 UK2006-07-28 0.52800 UK2007-09-01 0.53200 UK2008-10-05 0.53600 UK2009-11-09 0.54000 UK2010-12-14 0.54400 UK2012-01-18 0.54800 UK2013-02-21 0.55200 UK2014-03-28 0.55600 UK2015-05-02 0.56000 UK2016-06-05 1.06400 UK2017-07-10 1.06800 US2014-01-27 2.07200 US2015-03-03 3.07600 US2016-04-06 3.08000 US2017-05-11 3.0
Solution 3:
You could make date
the index and then use reindex
to expand the dates and ffill
to forward-fill the NaNs:
def expand_dates(grp):
start= grp.index.min()
end= today
index = pd.date_range(start, end, freq='D')
return grp.reindex(index).ffill()
Use groupby/apply
to call expand_dates
once for each group and concatenate the results:
df = df.groupby('country')['gd'].apply(expand_dates)
Correction: My first answer forward-filled the entire DataFrame as the last step: df = df.ffill()
. That is correct only if each country's first gd
value is not NaN
. If the starting row(s) for a certain country have NaN
gd
value(s), then forward-filling may contaminate those gd
values with values from another country. Yikes. The more robust and correct method would be to forward-fill once for each group as shown by piRSquared. Any performance gain achieved by forward-filling once instead of many times on smaller DataFrames would be minor since the number of ffill calls is limited by the number of countries (a pretty low number) and safe-guarding against a potential bug is far more important than the limited performance gain that is possible.
import numpy as np
import pandas as pd
df = pd.DataFrame({'country': ['US', 'US', 'US', 'UK', 'UK', 'UK'], 'date': ['01-01-2014', '01-01-2015', '01-01-2013', '01-01-2000', '02-01-2001', '01-01-2016'], 'gd': [2.0, 3.0, 0.4, 0.7, 0.5, 1.0]})
df['date'] = pd.to_datetime(df['date'])
today = pd.Timestamp('today')
defexpand_dates(grp):
start = grp.index.min()
end = today
index = pd.date_range(start, end, freq='D')
return grp.reindex(index).ffill()
df = df.set_index('date')
df = df.groupby('country')['gd'].apply(expand_dates)
print(pd.concat([df.head(), df.tail()]))
yields
countryUK2000-01-01 0.72000-01-02 0.72000-01-03 0.72000-01-04 0.72000-01-05 0.7US2017-07-14 3.02017-07-15 3.02017-07-16 3.02017-07-17 3.02017-07-18 3.0Name:gd,dtype:float64
Post a Comment for "Fill In Dates And Use Previous Values"