Detecting seasonality ¶
Sign up to the DEA Sandbox to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
Products used: DEA Waterbodies
We often want to analyse a time series to detect long-term trends or events. Time series that span multiple months will have seasonal effects: for example, northern Australia is much wetter in summer due to monsoons. This seasonality may impact our ability to detect the trends or events we are interested in. This notebook provides a few different ways to detect seasonality and check whether deseasonalised data have been correctly deseasonalised. We will look at a seasonal waterbody time series as an example.
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load key Python packages and supporting functions for the analysis.
%matplotlib inline import calendar import matplotlib.pyplot as plt import matplotlib.cm import numpy as np import pandas as pd import scipy.stats import statsmodels.tsa.seasonal as sm_seasonal import statsmodels.tsa.stattools as tsa_stats import shapely import sys sys.path.insert(1, '../Tools/') from dea_tools.waterbodies import get_waterbody, get_time_series
/env/lib/python3.8/site-packages/geopandas/_compat.py:106: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.1-CAPI-1.14.2). Conversions between both will be slow. warnings.warn(
Choose a waterbody to analyse:
geohash = "r1tw92u7y" # Lake Tyrrell
Load the waterbody time series¶
ts = get_time_series(geohash)
<matplotlib.axes._subplots.AxesSubplot at 0x7f5265ce9bb0>
Then resample the time series to weekly and interpolate. Having a consistent gap between observations makes analysis of time series much easier. In particular, this notebook won’t work without an evenly spaced time series.
Here, we interpolate with
pandas since our time series is in a
xarray has a similar interpolate method that can interpolate over
ts = ts.resample("1W").mean().interpolate(method="time").pc_wet
assert not ts.isnull().any()
Deseasonalise it for comparison.
decomposition = sm_seasonal.seasonal_decompose(ts + 1, model="multiplicative")
ts_deseasonal = decomposition.trend + decomposition.resid - 1
ts_deseasonal = ts_deseasonal[pd.notnull(ts_deseasonal)]
The autocorrelation function (ACF) shows how correlated lagged data are. Seasonal data are highly correlated with a lag of one year. Let’s compute the ACF for both our waterbody time series and the deseasonalised waterbody time series.
acf = tsa_stats.acf(ts, nlags=52 * 3, fft=True) acf_deseasonal = tsa_stats.acf(ts_deseasonal, nlags=52 * 3, fft=True)
plt.plot(acf, label="seasonal") plt.plot(acf_deseasonal, label="deseasonalised") plt.xlabel("Lag (weeks)") plt.ylabel("ACF") for i in range(1, 3): plt.axvline(52 * i, c="grey", linestyle="--") plt.legend();
The seasonal peaks are clearly visible at the 52 and 104 week marks for the seasonal data, while no such peaks can be seen for the deseasonalised data.
A date can be thought of as an angle (the angle of the Earth’s position around the sun). In this way we can project the time series into polar coordinates. Non-seasonal data will circle around the origin and be a perfect circle on average; seasonal data will be offset from the origin. Long-term trends may show as spirals.
time_angle = 2 * np.pi * ts.index.dayofyear / 365.25
ax = plt.subplot(projection="polar") ax.plot(time_angle, ts) ax.set_xticks(np.linspace(0, 2 * np.pi, 12, endpoint=False)) ax.set_xticklabels(list(calendar.month_abbr)[1:]);
We can also bin the data by angle to make the circle easier to see.
n_bins = 52 binned_mean = scipy.stats.binned_statistic(time_angle, ts, bins=n_bins) binned_stdev = scipy.stats.binned_statistic( time_angle, ts, bins=n_bins, statistic="std" )
ax = plt.subplot(projection="polar") mean = np.resize(binned_mean.statistic, n_bins + 1) stdev = np.resize(binned_stdev.statistic, n_bins + 1) # np.resize is different to ndarray.resize! ax.plot(binned_mean.bin_edges, mean) ax.fill_between( binned_mean.bin_edges, np.clip(mean - stdev, 0, None), mean + stdev, alpha=0.2 ) # Get the centre of the circle so we can plot that too: proj = ax.transData.transform(np.stack([binned_mean.bin_edges, mean]).T) polygon = shapely.geometry.Polygon(proj) reproj = ax.transData.inverted().transform((polygon.centroid.x, polygon.centroid.y)) plt.scatter(*reproj, c="C0") ax.set_xticks(np.linspace(0, 2 * np.pi, 12, endpoint=False)) ax.set_xticklabels(list(calendar.month_abbr)[1:]);
This waterbody is wetter from May to September and drier from November to April. The centre of the circle is clearly offset. Let’s compare to the deseasonalised:
time_angle_deseasonal = 2 * np.pi * ts_deseasonal.index.dayofyear / 365.25
binned_mean_deseasonal = scipy.stats.binned_statistic( time_angle_deseasonal, ts_deseasonal, bins=n_bins ) binned_stdev_deseasonal = scipy.stats.binned_statistic( time_angle_deseasonal, ts_deseasonal, bins=n_bins, statistic="std" ) ax = plt.subplot(projection="polar") ax.plot(binned_mean.bin_edges, mean, label="seasonal") ax.fill_between( binned_mean.bin_edges, np.clip(mean - stdev, 0, None), mean + stdev, alpha=0.2 ) plt.scatter(*reproj, c="C0") mean_deseasonal = np.resize(binned_mean_deseasonal.statistic, n_bins + 1) stdev_deseasonal = np.resize(binned_stdev_deseasonal.statistic, n_bins + 1) # np.resize is different to ndarray.resize! ax.plot(binned_mean_deseasonal.bin_edges, mean_deseasonal, label="deseasonalised") ax.fill_between( binned_mean_deseasonal.bin_edges, np.clip(mean_deseasonal - stdev_deseasonal, 0, None), mean_deseasonal + stdev_deseasonal, alpha=0.2, ) proj_deseasonal = ax.transData.transform( np.stack([binned_mean_deseasonal.bin_edges, mean_deseasonal]).T ) polygon_deseasonal = shapely.geometry.Polygon(proj_deseasonal) reproj_deseasonal = ax.transData.inverted().transform( (polygon_deseasonal.centroid.x, polygon_deseasonal.centroid.y) ) plt.scatter(*reproj_deseasonal, c="C1") ax.set_xticks(np.linspace(0, 2 * np.pi, 12, endpoint=False)) ax.set_xticklabels(list(calendar.month_abbr)[1:]) ax.legend(bbox_to_anchor=(0, 1));
The deaseasonalised data are much more circular and the centre is very close to the origin.
We can convert this into a numerical measure to help determine how seasonal the data are (which would be useful for analysis en masse). One way to do this is to compute the Polsby-Popper score, which is the ratio of the area to the squared perimeter, multiplied by \(4\pi\). This is a measure of compactness and a circle is a maximally compact shape.
pp = 4 * np.pi * polygon.area / polygon.exterior.length ** 2 pp_deseasonal = ( 4 * np.pi * polygon_deseasonal.area / polygon_deseasonal.exterior.length ** 2 )
print("Seasonal Polsby-Popper:", round(pp, 2)) print("Deseasonalised Polsby-Popper:", round(pp_deseasonal, 2))
Seasonal Polsby-Popper: 0.89 Deseasonalised Polsby-Popper: 0.93
The closer we are to 1, the more circular the data.
Circularity alone doesn’t tell us that we have non-seasonal data. The circle also needs to be centred. We can measure the distance from the origin:
print("Seasonal offset:", reproj) print("Deseasonalised offset:", reproj_deseasonal)
Seasonal offset: 20.280551766496924 Deseasonalised offset: 0.5140836481600226
The closer we are to zero, the less offset from the origin our data are.
Seasonal subseries plot¶
A seasonal subseries plot groups data by period (in this case there are 12 months in the year, so the period is 12), and then plots each group. This can help detect both seasonality and change in seasonality over time.
plt.figure(figsize=(2 * 6.4, 4.8)) titles = ["seasonal", "deseasonalised"] monthly_means = [, ] for i, ts_ in enumerate([ts, ts_deseasonal]): plt.subplot(1, 2, i + 1) colours = matplotlib.cm.rainbow(np.linspace(0, 1, 12)) for month, t in ts_.groupby(ts_.index.month): plt.plot( month + np.linspace(0, 1, len(t)), t.rolling(10).mean(), c=colours[month - 1], ) plt.plot([month, month + 1], [t.mean()] * 2, c="k") plt.plot( [month + 0.5] * 2, [t.mean() - t.std(), t.mean() + t.std()], c="k", alpha=0.5, ) monthly_means[i].append(t.mean()) plt.xlabel("Month") plt.xticks(np.arange(1.5, 13.5), calendar.month_abbr[1:]) plt.ylabel("Percentage wet") plt.title(titles[i])
The seasonal plot again shows very clear maxima during winter and minima during summer. The deseasonalised plot has no such pattern, and the differences between monthly means are within standard error.
We can also aggregate this plot into a single number representing seasonality. There are many ways to do this — any measure of how deviant the means are from a horizontal line would work, for example. One such measure would be the average deviation from the mean:
monthly_means = np.array(monthly_means)
mad = abs(monthly_means - monthly_means.mean(axis=1, keepdims=True)).mean(axis=1)
print("Seasonal mean average deviation:", mad.round(2)) print("Deseasonalised mean average deviation:", mad.round(2))
Seasonal mean average deviation: 13.74 Deseasonalised mean average deviation: 0.41
The closer to zero, the less seasonal the data are.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the
open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Last modified: September 2021