Skip to content Skip to sidebar Skip to footer

How To Convert Calendar Year To Water Year In Pandas

This question has been solved with R, but I haven't seen useful examples with Python. I would like to learn how to convert calendar year (1/1/1990 to 12/31/2010) discharge data to

Solution 1:

  • Using data from USGS 03335500 WABASH RIVER AT LAFAYETTE, IN

    • Dates: 2001-10-01 - 2017-09-31
  • The 'datetime' column must be set to a datetimedtype by importing the data with parse_dates, or using pd.to_datetime() after importing the data.

  • Use pandas.Series.where to determine the water year.

    • Use the .dt accessor to extract month and year numbers.
    • If the month number is less than 10, the water year is the .dt.year, otherwise, the water year is the .dt.year + 1
    • 13 times faster than the .apply function from this answer, for the 282757 rows in this DataFrame.
import pandas as pd

# Load the data
df = pd.read_csv('WabashRiver_Flow.csv', parse_dates=['datetime'])

# drop na values
df = df.dropna()

# determine the water year
df['water_year'] = df.datetime.dt.year.where(df.datetime.dt.month < 10, df.datetime.dt.year + 1)

# display(df.head())
  agency_cd  site_no            datetime tz_cd  discharge_cfps  water_year
0      USGS  33355002001-10-01 00:00:00   EST            261020021      USGS  33355002001-10-01 01:00:00   EST            261020022      USGS  33355002001-10-01 02:00:00   EST            261020023      USGS  33355002001-10-01 03:00:00   EST            263020024      USGS  33355002001-10-01 04:00:00   EST            26302002

Calculate the mean discharge rate by water year

annual_mean_discharge_rate = df.groupby('water_year')[['discharge_cfps']].mean()

# display(annual_mean_discharge_rate)
            discharge_cfps
water_year                
20029379.82958920038678.46832420048562.50500520058928.77625620066710.805312200710331.564789200810626.33662320098972.04660720105298.569557201110519.54086920129013.62442420139007.92420520149079.561658201512267.39377620166445.875810201710240.721464

annual_mean_discharge_rate.plot.bar(figsize=(8, 6), xlabel='Water Year', ylabel='Discharge (cubic feet / sec)', legend=False)

enter image description here


%%timeit comparison

import numpy as np

# function from other answer; updated because pd.datetime is deprecateddefassign_wy(row):
    if row.month>=10:
        return(row.year + 1)
    else:
        return(row.year)


%%timeit
df.datetime.dt.year.where(df.datetime.dt.month < 10, df.datetime.dt.year + 1)
[out]:
66.9 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
df.datetime.apply(lambda v: np.where(v.month >= 10, v.year + 1, v.year))
[out]:
1.38 s ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
df.datetime.apply(lambda x: assign_wy(x))
[out]:
861 ms ± 9.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 2:

You could use apply and write your own function to create a new column WY:

IF you have have df:

DateDischarge02011-10-01 00:00:00  0.012011-10-01 01:00:00  0.022011-10-01 02:00:00  0.032011-10-01 03:00:00  0.042011-10-01 04:00:00  0.0

Then:

import pandas as pd

defassign_wy(row):
    if row.Date.month>=10:
        return(pd.datetime(row.Date.year+1,1,1).year)
    else:
        return(pd.datetime(row.Date.year,1,1).year)

df['WY'] = df.apply(lambda x: assign_wy(x), axis=1)

Post a Comment for "How To Convert Calendar Year To Water Year In Pandas"