How To Convert Calendar Year To Water Year In Pandas
This question has been solved with R, but I haven't seen useful examples with Python. I would like to learn how to convert calendar year (1/1/1990 to 12/31/2010) discharge data to
Solution 1:
Using data from USGS 03335500 WABASH RIVER AT LAFAYETTE, IN
- Dates: 2001-10-01 - 2017-09-31
The
'datetime'
column must be set to adatetime
dtype
by importing the data withparse_dates
, or usingpd.to_datetime()
after importing the data.Use
pandas.Series.where
to determine the water year.- Use the
.dt
accessor to extractmonth
andyear
numbers. - If the month number is less than 10, the water year is the
.dt.year
, otherwise, the water year is the.dt.year + 1
- 13 times faster than the
.apply
function from this answer, for the282757
rows in this DataFrame.
- Use the
import pandas as pd
# Load the data
df = pd.read_csv('WabashRiver_Flow.csv', parse_dates=['datetime'])
# drop na values
df = df.dropna()
# determine the water year
df['water_year'] = df.datetime.dt.year.where(df.datetime.dt.month < 10, df.datetime.dt.year + 1)
# display(df.head())
agency_cd site_no datetime tz_cd discharge_cfps water_year
0 USGS 33355002001-10-01 00:00:00 EST 261020021 USGS 33355002001-10-01 01:00:00 EST 261020022 USGS 33355002001-10-01 02:00:00 EST 261020023 USGS 33355002001-10-01 03:00:00 EST 263020024 USGS 33355002001-10-01 04:00:00 EST 26302002
Calculate the mean discharge rate by water year
annual_mean_discharge_rate = df.groupby('water_year')[['discharge_cfps']].mean()
# display(annual_mean_discharge_rate)
discharge_cfps
water_year
20029379.82958920038678.46832420048562.50500520058928.77625620066710.805312200710331.564789200810626.33662320098972.04660720105298.569557201110519.54086920129013.62442420139007.92420520149079.561658201512267.39377620166445.875810201710240.721464
annual_mean_discharge_rate.plot.bar(figsize=(8, 6), xlabel='Water Year', ylabel='Discharge (cubic feet / sec)', legend=False)
%%timeit
comparison
pandas.Series.where
compared topandas.Series.apply
withnp.where
, and the function from the other answer..Series.where
is vectorized, while.apply
is not.
import numpy as np
# function from other answer; updated because pd.datetime is deprecateddefassign_wy(row):
if row.month>=10:
return(row.year + 1)
else:
return(row.year)
%%timeit
df.datetime.dt.year.where(df.datetime.dt.month < 10, df.datetime.dt.year + 1)
[out]:
66.9 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
df.datetime.apply(lambda v: np.where(v.month >= 10, v.year + 1, v.year))
[out]:
1.38 s ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
df.datetime.apply(lambda x: assign_wy(x))
[out]:
861 ms ± 9.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solution 2:
You could use apply and write your own function to create a new column WY
:
IF you have have df
:
DateDischarge02011-10-01 00:00:00 0.012011-10-01 01:00:00 0.022011-10-01 02:00:00 0.032011-10-01 03:00:00 0.042011-10-01 04:00:00 0.0
Then:
import pandas as pd
defassign_wy(row):
if row.Date.month>=10:
return(pd.datetime(row.Date.year+1,1,1).year)
else:
return(pd.datetime(row.Date.year,1,1).year)
df['WY'] = df.apply(lambda x: assign_wy(x), axis=1)
Post a Comment for "How To Convert Calendar Year To Water Year In Pandas"