Pandas Computer Hourly Average And Set At Middle Of Interval
I want to compute the hourly mean for a time series of wind speed and direction, but I want to set the time at the half hour. So, the average for values from 14:00 to 15:00 will be
Solution 1:
So the easiest way is to resample and then use linear interpolation:
In [21]:rng=pd.date_range('1/1/2011',periods=72,freq='H')In [22]:ts=pd.Series(np.random.randn(len(rng)),index=rng)...:In [23]:ts.head()Out[23]:2011-01-01 00:00:00 0.7967042011-01-01 01:00:00 -1.1531792011-01-01 02:00:00 -1.9194752011-01-01 03:00:00 0.0824132011-01-01 04:00:00 -0.397434Freq:H,dtype:float64In [24]:ts2=ts.resample('30T').interpolate()In [25]:ts2.head()Out[25]:2011-01-01 00:00:00 0.7967042011-01-01 00:30:00 -0.1782372011-01-01 01:00:00 -1.1531792011-01-01 01:30:00 -1.5363272011-01-01 02:00:00 -1.919475Freq:30T,dtype:float64In [26]:
I believe this is what you need.
Edit to add clarifying example
Perhaps it's easier to see what's going on without random Data:
In [29]:ts.head()Out[29]:2011-01-01 00:00:00 02011-01-01 01:00:00 12011-01-01 02:00:00 22011-01-01 03:00:00 32011-01-01 04:00:00 4Freq:H,dtype:int64In [30]:ts2=ts.resample('30T').interpolate()In [31]:ts2.head()Out[31]:2011-01-01 00:00:00 0.02011-01-01 00:30:00 0.52011-01-01 01:00:00 1.02011-01-01 01:30:00 1.52011-01-01 02:00:00 2.0Freq:30T,dtype:float64
Solution 2:
This post is already several years old and uses the API that has long been deprecated. Modern Pandas already provides the resample
method that is easier to use than pandas.TimeGrouper
. Yet it allows only left and right labelled intervals but getting the intervals centered at the middle of the interval is not readily available.
Yet this is not hard to do.
First we fill in the data that we want to resample:
ts_g=[datetime.datetime.fromisoformat('2019-11-20') +
datetime.timedelta(minutes=10*x) for x in range(0,100)]
dg = {'ws': range(0,100), 'wdir': range(0,100)}
df_g = pd.DataFrame(data=dg, index=ts_g, columns=['ws','wdir'])
df_g.head()
The output would be:
wswdir2019-11-20 00:00:00 002019-11-20 00:10:00 112019-11-20 00:20:00 222019-11-20 00:30:00 332019-11-20 00:40:00 44
Now we first resample to 30 minute intervals
grouped_g = df_g.resample('30min')
halfhourly_ws_g = grouped_g['ws'].mean()
halfhourly_ws_g.head()
The output would be:
2019-11-20 00:00:00 12019-11-20 00:30:00 42019-11-20 01:00:00 72019-11-20 01:30:00 102019-11-20 02:00:00 13Freq:30T,Name:ws,dtype:int64
Finally the trick to get the centered intervals:
hourly_ws_g = halfhourly_ws_g.add(halfhourly_ws_g.shift(1)).div(2)\
.loc[halfhourly_ws_g.index.minute % 60 == 30]
hourly_ws_g.head()
This would produce the expected output:
2019-11-20 00:30:00 2.52019-11-20 01:30:00 8.52019-11-20 02:30:00 14.52019-11-20 03:30:00 20.52019-11-20 04:30:00 26.5Freq:60T,Name:ws,dtype:float64
Post a Comment for "Pandas Computer Hourly Average And Set At Middle Of Interval"