5

Consider the following pandas Series with a DatatimeIndex of daily values (using day-of-year as an example):

import pandas as pd

dti = pd.date_range("2017-11-02", "2019-05-21", freq="D", inclusive="both")
s = pd.Series(dti.dayofyear.astype(float), index=dti)

print(s)
# 2017-11-02    306.0
# 2017-11-03    307.0
# 2017-11-04    308.0
# 2017-11-05    309.0
# 2017-11-06    310.0
#               ...
# 2019-05-17    137.0
# 2019-05-18    138.0
# 2019-05-19    139.0
# 2019-05-20    140.0
# 2019-05-21    141.0
# Freq: D, Length: 566, dtype: float64

How can I resample this to 3-month mean values, where the origin aligns to include 1 January? My best attempt here does not align to this expected origin, i.e. it's off by 1 month:

r = s.resample("3MS", origin=pd.Timestamp("2018-01-01")).mean()

print(r)
# 2017-11-01    226.659341
# 2018-02-01     76.000000
# 2018-05-01    166.500000
# 2018-08-01    258.500000
# 2018-11-01    227.510870
# 2019-02-01     76.000000
# 2019-05-01    131.000000
# Freq: 3MS, dtype: float64

My workaround is to create a new Series that starts/ends on desired resample time range, padding the non-existing data with NaN values:

dti2 = pd.date_range("2017-10-01", "2019-06-01", freq="D", inclusive="both")
s2 = pd.Series(index=dti2)
s2[s.index] = s.values
s2.resample("3MS").mean()
r2 = s2.resample("3MS").mean()

print(r2)
# 2017-10-01    335.5
# 2018-01-01     45.5
# 2018-04-01    136.0
# 2018-07-01    227.5
# 2018-10-01    319.5
# 2019-01-01     45.5
# 2019-04-01    116.0
# Freq: 3MS, dtype: float64

but this is not intuitive.

2 Answers 2

6

You don’t need to pad the series manually.

To get 3-month means aligned to January 1, use the quarterly start frequency with January as the anchor: `QS-JAN`. This will automatically create quarters starting in Jan, Apr, Jul, and Oct. For example:

r = s.resample("QS-JAN").mean()
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for enlightening me to the concept of anchored offsets!
1

You could use 'QS' (start of quarter) (ref):

import pandas as pd

dti = pd.date_range('2017-11-02', '2019-05-21', freq='D', inclusive='both')
s = pd.Series(dti.dayofyear.astype(float), index=dti)
r = s.resample('QS').mean()
print(r)

Output:

2017-10-01    335.5
2018-01-01     45.5
2018-04-01    136.0
2018-07-01    227.5
2018-10-01    319.5
2019-01-01     45.5
2019-04-01    116.0
Freq: QS-JAN, dtype: float64

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.