# Working with time in Xarray ¶

• **Sign up to the DEA Sandbox** to run this notebook interactively from a browser

• Compatibility: Notebook currently compatible with both the NCI and DEA Sandbox environments

• Products used: ga_ls8c_ard_3

## Description¶

The xarray Python package provides many useful techniques for dealing with time series data that can be applied to Digital Earth Australia data. This notebook demonstrates how to use xarray techniques to:

1. Select different time periods of data (e.g. year, month, day) from an xarray.Dataset

2. Using datetime accessors to extract additional information from a dataset’s time dimension

3. Summarising time series data for different time periods using .groupby() and .resample()

4. Interpolating time series data to estimate landscape conditions at a specific date that the satellite did not observe

For additional information about the techniques demonstrated below, refer to the xarray Time series data guide.

## Getting started¶

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

[1]:

%matplotlib inline

import datacube
import matplotlib.pyplot as plt



### Connect to the datacube¶

[2]:

dc = datacube.Datacube(app='Working_with_time')



[3]:

# Set up a location for the analysis
query = {
'x': (142.41, 142.51),
'y': (-32.22, -32.32),
'time': ('2015-01-01', '2016-12-31'),
'output_crs': 'EPSG:3577',
'resolution': (-30, 30)
}

# Apply simple cloud mask (keep only clear, water or snow observations)

print(ds)


<xarray.Dataset>
Dimensions:    (time: 46, x: 342, y: 398)
Coordinates:
* time       (time) datetime64[ns] 2015-01-06T00:20:26.943424 ... 2016-12-26T00:20:38.145340
* y          (y) float64 -3.552e+06 -3.552e+06 ... -3.563e+06 -3.563e+06
* x          (x) float64 9.709e+05 9.71e+05 9.71e+05 ... 9.811e+05 9.812e+05
Data variables:
nbart_nir  (time, y, x) float64 nan nan nan nan nan ... nan nan nan nan nan
fmask      (time, y, x) float64 nan nan nan nan nan ... nan nan nan nan nan
Attributes:
crs:      EPSG:3577


## Working with time¶

### Indexing by time¶

We can select data for an entire year by passing a string to .sel():

[4]:

ds.sel(time='2015')


[4]:

<xarray.Dataset>
Dimensions:    (time: 23, x: 342, y: 398)
Coordinates:
* time       (time) datetime64[ns] 2015-01-06T00:20:26.943424 ... 2015-12-24T00:20:32.375097
* y          (y) float64 -3.552e+06 -3.552e+06 ... -3.563e+06 -3.563e+06
* x          (x) float64 9.709e+05 9.71e+05 9.71e+05 ... 9.811e+05 9.812e+05
Data variables:
nbart_nir  (time, y, x) float64 nan nan nan ... 3.123e+03 3.148e+03
fmask      (time, y, x) float64 nan nan nan nan nan ... 1.0 1.0 1.0 1.0 1.0
Attributes:
crs:      EPSG:3577

Or select a single month:

[5]:

ds.sel(time='2015-05')


[5]:

<xarray.Dataset>
Dimensions:    (time: 2, x: 342, y: 398)
Coordinates:
* time       (time) datetime64[ns] 2015-05-14T00:19:24.695756 2015-05-30T00:19:28.951301
* y          (y) float64 -3.552e+06 -3.552e+06 ... -3.563e+06 -3.563e+06
* x          (x) float64 9.709e+05 9.71e+05 9.71e+05 ... 9.811e+05 9.812e+05
Data variables:
nbart_nir  (time, y, x) float64 2.02e+03 2.276e+03 2.164e+03 ... nan nan nan
fmask      (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... nan nan nan nan nan
Attributes:
crs:      EPSG:3577

Or select a range of dates using slice(). This selects all observations between the two dates, inclusive of both the start and stop values:

[6]:

ds.sel(time=slice('2015-06', '2016-01'))


[6]:

<xarray.Dataset>
Dimensions:    (time: 15, x: 342, y: 398)
Coordinates:
* time       (time) datetime64[ns] 2015-06-15T00:19:41.139454 ... 2016-01-25T00:20:28.787852
* y          (y) float64 -3.552e+06 -3.552e+06 ... -3.563e+06 -3.563e+06
* x          (x) float64 9.709e+05 9.71e+05 9.71e+05 ... 9.811e+05 9.812e+05
Data variables:
nbart_nir  (time, y, x) float64 nan nan nan ... 3.324e+03 3.307e+03
fmask      (time, y, x) float64 nan nan nan nan nan ... 1.0 1.0 1.0 1.0 1.0
Attributes:
crs:      EPSG:3577

### Using the datetime accessor¶

xarray allows you to easily extract additional information from the time dimension in Digital Earth Australia data. For example, we can get a list of what season each observation belongs to:

[7]:

ds.time.dt.season


[7]:

<xarray.DataArray 'season' (time: 46)>
array(['DJF', 'DJF', 'DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'MAM', 'MAM',
'MAM', 'JJA', 'JJA', 'JJA', 'JJA', 'JJA', 'SON', 'SON', 'SON',
'SON', 'SON', 'SON', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF',
'MAM', 'MAM', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA', 'JJA',
'JJA', 'JJA', 'SON', 'SON', 'SON', 'SON', 'SON', 'SON', 'DJF',
'DJF'], dtype='<U3')
Coordinates:
* time     (time) datetime64[ns] 2015-01-06T00:20:26.943424 ... 2016-12-26T00:20:38.145340

Or the day of the year:

[8]:

ds.time.dt.dayofyear


[8]:

<xarray.DataArray 'dayofyear' (time: 46)>
array([  6,  22,  38,  54,  70,  86, 102, 118, 134, 150, 166, 182, 198,
214, 230, 246, 262, 278, 294, 310, 326, 342, 358,   9,  25,  41,
57,  73,  89, 105, 121, 137, 153, 169, 185, 201, 217, 233, 249,
265, 281, 297, 313, 329, 345, 361])
Coordinates:
* time     (time) datetime64[ns] 2015-01-06T00:20:26.943424 ... 2016-12-26T00:20:38.145340

### Grouping and resampling by time¶

xarray also provides some shortcuts for aggregating data over time. In the example below, we first group our data by season, then take the median of each group. This produces a new dataset with only four observations (one per season).

[9]:

# Group the time series into seasons, and take median of each time period
ds_seasonal = ds.groupby('time.season').median(dim='time')

# Plot the output
ds_seasonal.nbart_nir.plot(col='season', col_wrap=4)
plt.show()



We can also use the .resample() method to summarise our dataset into larger chunks of time. In the example below, we produce a median composite for every 6 months of data in our dataset:

[10]:

# Resample to combine each 6 months of data into a median composite
ds_resampled = ds.resample(time='6m').median()

# Plot the new resampled data
ds_resampled.nbart_nir.plot(col='time')
plt.show()


### Interpolating new timesteps¶

Sometimes, we want to return data for specific times/dates that weren’t observed by a satellite. To estimate what the landscape appeared like on certain dates, we can use the .interp() method to interpolate between the nearest two observations.

By default, the interp() method uses linear interpolation (method='linear'). Another useful option is method='nearest', which will return the nearest satellite observation to the specified date(s).

[11]:

# New dates to interpolate data for
new_dates = ['2016-08-25', '2016-09-01', '2016-09-05']

# Interpolate Landsat values for three new dates
ds_interp = ds.interp(time=new_dates)

# Plot the new interpolated data
ds_interp.nbart_nir.plot(col='time')
plt.show()



Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[12]:

print(datacube.__version__)

1.7+142.g7f8581cf


## Tags¶

Browse all available tags on the DEA User Guide’s Tags Index

Tags: sandbox compatible, NCI compatible, dc.load, time series analysis, landsat 8, groupby, indexing, interpolating, resampling, image compositing