Using load_ard to load and cloud mask multiple satellite sensors
¶
**Sign up to the DEA Sandbox** to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
NCI
andDEA Sandbox
environmentsProducts used: ga_ls5t_ard_3, ga_ls7e_ard_3, ga_ls8c_ard_3, s2a_ard_granule, s2b_ard_granule
Description¶
This notebook demonstrates how to use the load_ard
function to import a time series of cloud-free observations from either multiple Landsat (i.e. Landsat 5, 7 and 8) or Sentinel-2 satellites (i.e. Sentinel-2A and 2B). The function will automatically apply pixel quality masking (e.g. cloud masking) or contiguity masking to the input data and returns all available data from multiple sensors as a single combined xarray.Dataset
.
Optionally, the function can be used to return only observations that contain a minimum proportion of good quality, non-cloudy or shadowed pixels. This can be used to extract visually appealing time series of observations that are not affected by cloud.
The function supports the following products:
Landsat (GA Collection 3):
ga_ls5t_ard_3
,ga_ls7e_ard_3
,ga_ls8c_ard_3
Sentinel-2 Definitive:
s2a_ard_granule
,s2b_ard_granule
Sentinel-2 Near Real Time:
s2a_nrt_granule
,s2b_nrt_granule
This notebook demonstrates how to use load_ard
to:
Load and combine Landsat 5, 7 and 8 data into a single
xarray.Dataset
Optionally apply a cloud mask to the resulting data
Filter resulting data to keep only cloud-free observations
Discard Landsat 7 SLC-off failure data
Filter data before loading using a custom function
Load and combine Sentinel-2A and Sentinel-2B data into a single
xarray.Dataset
Lazily load data using Dask
Getting started¶
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages¶
[1]:
%matplotlib inline
import datacube
import matplotlib.pyplot as plt
import sys
sys.path.append('../Scripts')
from dea_datahandling import load_ard
Connect to the datacube¶
[2]:
dc = datacube.Datacube(app='Using_load_ard')
Loading multiple Landsat sensors¶
The load_ard
function can be used to load a single, combined timeseries of cloud-masked data from multiple DEA
products or satellite sensors. At its simplest, you can use the function similarly to dc.load
by passing a set of spatiotemporal query parameters (e.g. x
, y
, time
, measurements
, output_crs
, resolution
, group_by
etc) directly into the function (see the dc.load documentation for all possible
options). The key difference from dc.load
is that load_ard
also requires an existing Datacube
object, which is passed using the dc
parameter. This gives us the flexibilty to load data from development or experimental datacubes.
In the examples below, we load a single band of data (nbart_green
) from the three Landsat Collection 3 products (Landsat 5, 7 and 8) by specifying: products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3']
. The function always outputs the number of observations for each product, and the total number loaded. For the following examples, the function output shows that 0 Landsat 5 observations, 11 Landsat 7 observations, and 12 Landsat 8 observations were loaded, for a combined total of
23 observations.
Explicit syntax¶
The following example demonstrates how key parameters can be passed directly to load_ard
.
[3]:
# Load available data from all three Landsat satellites
ds = load_ard(dc=dc,
products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'],
x=(153.38, 153.47),
y=(-28.83, -28.92),
time=('2018-04', '2018-06'),
measurements=['nbart_green'],
output_crs='EPSG:3577',
resolution=(-30, 30),
group_by='solar_day')
# Print output data
print(ds)
Finding datasets
ga_ls5t_ard_3
ga_ls7e_ard_3
ga_ls8c_ard_3
Applying pixel quality/cloud mask
Loading 16 time steps
<xarray.Dataset>
Dimensions: (time: 16, x: 342, y: 380)
Coordinates:
* y (y) float64 -3.304e+06 -3.304e+06 ... -3.316e+06 -3.316e+06
spatial_ref int32 3577
* x (x) float64 2.05e+06 2.051e+06 ... 2.061e+06 2.061e+06
* time (time) datetime64[ns] 2018-04-01T23:43:28.557464 ... 2018-06-28T23:41:33.464057
Data variables:
nbart_green (time, y, x) float32 nan nan nan nan ... 559.0 405.0 395.0
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref
Query syntax¶
The following example demonstrates how key parameters can be stored in a query
dictionary, to be passed as a single parameter to load_ard
. The query
can then be reused in other load_ard
calls.
[4]:
# Create a reusable query
query = {
'x': (153.38, 153.47),
'y': (-28.83, -28.92),
'time': ('2019-01', '2019-05'),
'measurements': ['nbart_green'],
'output_crs': 'EPSG:3577',
'resolution': (-30, 30),
'group_by': 'solar_day'
}
# Load available data from all three Landsat satellites
ds = load_ard(dc=dc,
products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'],
**query)
# Print output data
print(ds)
Finding datasets
ga_ls5t_ard_3
ga_ls7e_ard_3
ga_ls8c_ard_3
Applying pixel quality/cloud mask
Loading 22 time steps
<xarray.Dataset>
Dimensions: (time: 22, x: 342, y: 380)
Coordinates:
* y (y) float64 -3.304e+06 -3.304e+06 ... -3.316e+06 -3.316e+06
spatial_ref int32 3577
* x (x) float64 2.05e+06 2.051e+06 ... 2.061e+06 2.061e+06
* time (time) datetime64[ns] 2019-01-06T23:42:23.012115 ... 2019-05-30T23:42:14.416872
Data variables:
nbart_green (time, y, x) float32 nan nan nan nan ... 543.0 394.0 330.0
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref
Working with cloud masking¶
By plotting a time slice from the data we loaded above, you can see an area of white pixels where clouds have been masked out and set to NaN
:
[5]:
# Plot single observation
ds.isel(time=5).nbart_green.plot()
plt.show()

By default, load_ard
applies a pixel quality mask to loaded data using the fmask
band. The default mask is created based on fmask
categories ['valid', 'snow', 'water']
which will preserve non-cloudy or shadowed land, snow and water pixels, and set all invalid, cloudy or shadowy pixels to NaN
. This can be customised using the fmask_categories
parameter. To deactive cloud masking completely, set mask_pixel_quality=False
:
[6]:
# Load available data with cloud masking deactivated
ds = load_ard(dc=dc,
products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'],
mask_pixel_quality=False,
**query)
# Plot single observation
ds.isel(time=5).nbart_green.plot()
plt.show()
Finding datasets
ga_ls5t_ard_3
ga_ls7e_ard_3
ga_ls8c_ard_3
Loading 22 time steps

Filtering to non-cloudy observations¶
In addition to masking out cloud, load_ard
allows you to discard any satellite observation that contains less than a minimum proportion of good quality (e.g. non-cloudy) pixels. This can be used to obtain a time series of only clear, cloud-free observations.
To discard all observations with less than X
% good quality pixels, use the min_gooddata
parameter. For example, min_gooddata=0.99
will return only observations where less than 1% of pixels contain cloud, cloud shadow or other invalid data, resulting in a smaller number of clear, cloud free images being returned by the function:
[7]:
# Load available data filtered to 99% clear observations
ds_noclouds = load_ard(dc=dc,
products=['ga_ls5t_ard_3',
'ga_ls7e_ard_3',
'ga_ls8c_ard_3'],
min_gooddata=0.99,
**query)
# Plot single observation
ds_noclouds.isel(time=0).nbart_green.plot()
plt.show()
Finding datasets
ga_ls5t_ard_3
ga_ls7e_ard_3
ga_ls8c_ard_3
Counting good quality pixels for each time step
Filtering to 2 out of 22 time steps with at least 99.0% good quality pixels
Applying pixel quality/cloud mask
Loading 2 time steps

Discarding Landsat 7 SLC-off failure data¶
On May 31 2003, Landsat 7’s Scan Line Corrector (SLC) that compensated for the satellite’s forward motion failed, introducing linear data gaps in all subsequent Landsat 7 observations. For example, the following Landsat 7 image contains visible striping:
[8]:
# Plot Landsat data
ds.isel(time=1).nbart_green.plot()
[8]:
<matplotlib.collections.QuadMesh at 0x7f8c69df1940>

Although this data still contains valuable information, for some applications (e.g. generating clean composites from multiple images) it can be useful to discard Landsat 7 imagery acquired after the SLC failure. This data is known as “SLC-off” data.
This can be achieved using load_ard
using the ls7_slc_off
. By default this is set to ls7_slc_off=True
which will include all SLC-off data. Set to ls7_slc_off=False
to discard this data instead; observe that the function now reports that it is ignoring SLC-off observations:
Finding datasets ga_ls5t_ard_3 ga_ls7e_ard_3 (ignoring SLC-off observations) ga_ls8c_ard_3
[9]:
# Load available data after discarding Landsat 7 SLC-off data
ds = load_ard(dc=dc,
products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'],
ls7_slc_off=False,
**query)
# Print output data
print(ds)
Finding datasets
ga_ls5t_ard_3
ga_ls7e_ard_3 (ignoring SLC-off observations)
ga_ls8c_ard_3
Applying pixel quality/cloud mask
Loading 14 time steps
<xarray.Dataset>
Dimensions: (time: 14, x: 342, y: 380)
Coordinates:
* y (y) float64 -3.304e+06 -3.304e+06 ... -3.316e+06 -3.316e+06
spatial_ref int32 3577
* x (x) float64 2.05e+06 2.051e+06 ... 2.061e+06 2.061e+06
* time (time) datetime64[ns] 2019-01-06T23:42:23.012115 ... 2019-05-30T23:42:14.416872
Data variables:
nbart_green (time, y, x) float32 nan nan nan nan ... 543.0 394.0 330.0
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref
Filtering data before load using a custom function¶
The load_ard
function has a powerful predicate
parameter that allows you to filter out satellite observations before they are actually loaded using a custom function. Some examples of where this may be useful include:
Filtering to return data from a specific season (e.g. summer, winter)
Filtering to return data acquired on a particular day of the year
Filtering to return data based on an external dataset (e.g. data acquired during specific climatic conditions such as drought or flood)
A predicate function should take a datacube.model.Dataset
object as an input (e.g. as returned from dc_landsat.find_datasets(product='ga_ls8c_ard_3', **query)[0]
), and return either True
or False
. For example, a predicate function could be used to return True
for only datasets acquired in April:
dataset.time.begin.month == 4
In the example below, we create a simple predicate function that will filter our data to return only satellite data acquired in April:
[10]:
# Simple function that returns True if month is April
def filter_april(dataset):
return dataset.time.begin.month == 4
# Load data that passes the `filter_april` function
ds = load_ard(dc=dc,
products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'],
predicate=filter_april,
**query)
# Print output data
print(ds)
Finding datasets
ga_ls5t_ard_3
ga_ls7e_ard_3
ga_ls8c_ard_3
Filtering datasets using predicate function
Applying pixel quality/cloud mask
Loading 6 time steps
<xarray.Dataset>
Dimensions: (time: 6, x: 342, y: 380)
Coordinates:
* y (y) float64 -3.304e+06 -3.304e+06 ... -3.316e+06 -3.316e+06
spatial_ref int32 3577
* x (x) float64 2.05e+06 2.051e+06 ... 2.061e+06 2.061e+06
* time (time) datetime64[ns] 2019-04-04T23:34:45.825672 ... 2019-04-28T23:41:55.717144
Data variables:
nbart_green (time, y, x) float32 nan nan nan nan ... 599.0 513.0 489.0
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref
We can print the time steps returned by load_ard
to verify that they now include only April observations (e.g. 2018-04-...
):
[11]:
ds.time.values
[11]:
array(['2019-04-04T23:34:45.825672000', '2019-04-05T23:35:51.576906000',
'2019-04-12T23:41:59.899975000', '2019-04-20T23:34:11.272138000',
'2019-04-21T23:35:45.185905000', '2019-04-28T23:41:55.717144000'],
dtype='datetime64[ns]')
Filter to a single season¶
An example of a predicate function that will return data from a season of interest would look as follows:
def seasonal_filter(dataset, season=[12, 1, 2]):
#return true if month is in defined season
return dataset.time.begin.month in season
After applying this predicate function, running the following command demonstrates that our dataset only contains months during the Dec, Jan, Feb period
ds.time.dt.season :
<xarray.DataArray 'season' (time: 37)>
array(['DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF',
'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF',
'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF',
'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF', 'DJF',
'DJF'], dtype='<U3')
Coordinates:
* time (time) datetime64[ns] 2016-01-05T10:27:44.213284 ... 2017-12-26T10:23:43.129624
Loading Sentinel-2 data¶
Data from the Sentinel-2A and Sentinel-2B satellites can also be loaded using load_ard
. To do this, we need to specify Sentinel-2 products in place of the Landsat products above.
The query
parameter can be reused to load Sentinel-2 data for the same specifcations used for the Landsat data above:
[12]:
# Load available data from both Sentinel 2 satellites
ds = load_ard(dc=dc,
products=['s2a_ard_granule', 's2b_ard_granule'],
**query)
# Print output data
print(ds)
Finding datasets
s2a_ard_granule
s2b_ard_granule
Applying pixel quality/cloud mask
Loading 31 time steps
<xarray.Dataset>
Dimensions: (time: 31, x: 342, y: 380)
Coordinates:
* y (y) float64 -3.304e+06 -3.304e+06 ... -3.316e+06 -3.316e+06
spatial_ref int32 3577
* x (x) float64 2.05e+06 2.051e+06 ... 2.061e+06 2.061e+06
* time (time) datetime64[ns] 2019-01-01T23:52:41.024000 ... 2019-05-31T23:52:51.024000
Data variables:
nbart_green (time, y, x) float32 786.0 646.0 664.0 702.0 ... nan nan nan
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref
Cloudy pixels are masked out by default from the resulting observations similarly to Landsat:
[13]:
# Plot single observation
ds.isel(time=2).nbart_green.plot()
plt.show()

Lazy loading with Dask¶
Rather than load data directly - which can take a long time and large amounts of memory - all datacube data can be lazy loaded using Dask
. This can be a very useful approach for when you need to load large amounts of data without crashing your analysis, or if you want to subsequently scale your analysis by distributing tasks in parallel across multiple workers.
The load_ard
function can be easily adapted to lazily load data rather than loading it into memory by providing a dask_chunks
parameter using either the explicit or query syntax. The minimum required to lazily load data is dask_chunks={}
, but chunking can also be performed spatially (e.g. dask_chunks={'x': 1000, 'y': 1000}
) or by time (e.g. dask_chunks={'time': 1}
) depending on the analysis being conducted.
Note: For more information about using Dask, refer to the Parallel processing with Dask notebook.
[14]:
# Lazily load available Sentinel 2 data
ds = load_ard(dc=dc,
products=['s2a_ard_granule', 's2b_ard_granule'],
dask_chunks={},
**query)
# Print output data
print(ds)
Finding datasets
s2a_ard_granule
s2b_ard_granule
Applying pixel quality/cloud mask
Returning 31 time steps as a dask array
<xarray.Dataset>
Dimensions: (time: 31, x: 342, y: 380)
Coordinates:
* y (y) float64 -3.304e+06 -3.304e+06 ... -3.316e+06 -3.316e+06
spatial_ref int32 3577
* x (x) float64 2.05e+06 2.051e+06 ... 2.061e+06 2.061e+06
* time (time) datetime64[ns] 2019-01-01T23:52:41.024000 ... 2019-05-31T23:52:51.024000
Data variables:
nbart_green (time, y, x) float32 dask.array<chunksize=(1, 380, 342), meta=np.ndarray>
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref
Note that the data loads almost instantaneously, and that that each of the arrays listed under Data variables
are now described as dask.arrays
. To load the data into memory, you can run:
[15]:
ds.compute()
[15]:
- time: 31
- x: 342
- y: 380
- y(y)float64-3.304e+06 ... -3.316e+06
- units :
- metre
- resolution :
- -30.0
- crs :
- EPSG:3577
array([-3304245., -3304275., -3304305., ..., -3315555., -3315585., -3315615.])
- spatial_ref()int323577
- spatial_ref :
- PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]]
- grid_mapping_name :
- albers_conical_equal_area
array(3577, dtype=int32)
- x(x)float642.05e+06 2.051e+06 ... 2.061e+06
- units :
- metre
- resolution :
- 30.0
- crs :
- EPSG:3577
array([2050485., 2050515., 2050545., ..., 2060655., 2060685., 2060715.])
- time(time)datetime64[ns]2019-01-01T23:52:41.024000 ... 2019-05-31T23:52:51.024000
- units :
- seconds since 1970-01-01 00:00:00
array(['2019-01-01T23:52:41.024000000', '2019-01-06T23:52:49.024000000', '2019-01-11T23:52:41.024000000', '2019-01-16T23:52:49.024000000', '2019-01-21T23:52:41.024000000', '2019-01-26T23:52:49.024000000', '2019-01-31T23:52:41.024000000', '2019-02-05T23:52:49.024000000', '2019-02-10T23:52:41.024000000', '2019-02-15T23:52:49.024000000', '2019-02-20T23:47:01.024000000', '2019-02-25T23:52:49.024000000', '2019-03-02T23:52:41.024000000', '2019-03-07T23:52:39.024000000', '2019-03-12T23:52:41.024000000', '2019-03-17T23:52:49.024000000', '2019-03-22T23:52:41.024000000', '2019-03-27T23:52:49.024000000', '2019-04-01T23:52:51.024000000', '2019-04-06T23:52:49.024000000', '2019-04-11T23:52:51.024000000', '2019-04-16T23:52:49.024000000', '2019-04-21T23:52:51.024000000', '2019-04-26T23:52:49.024000000', '2019-05-01T23:52:51.024000000', '2019-05-06T23:52:59.024000000', '2019-05-11T23:52:51.024000000', '2019-05-16T23:52:59.024000000', '2019-05-21T23:52:51.024000000', '2019-05-26T23:52:59.024000000', '2019-05-31T23:52:51.024000000'], dtype='datetime64[ns]')
- nbart_green(time, y, x)float32786.0 646.0 664.0 ... nan nan nan
- units :
- 1
- spectral_definition :
- {'response': [0.00084, 0.016372, 0.037688, 0.080665, 0.169509, 0.341374, 0.573031, 0.746711, 0.828036, 0.868228, 0.888565, 0.891572, 0.87815, 0.860271, 0.843698, 0.834035, 0.832617, 0.844968, 0.867734, 0.897361, 0.933938, 0.966923, 0.990762, 1.0, 0.997812, 0.981107, 0.947088, 0.907183, 0.868656, 0.837622, 0.81291, 0.792434, 0.7851, 0.789606, 0.807011, 0.830458, 0.846433, 0.858402, 0.85799, 0.799824, 0.62498, 0.386994, 0.200179, 0.098293, 0.042844, 0.016512], 'wavelength': [537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582]}
- crs :
- EPSG:3577
- grid_mapping :
- spatial_ref
array([[[ 786., 646., 664., ..., 417., 1080., 930.], [ 720., 826., 811., ..., 361., 1448., 1356.], [ 723., 767., 614., ..., 218., 1273., 1501.], ..., [ 460., 474., 538., ..., 554., 719., 394.], [ 419., 438., 506., ..., 561., 518., 336.], [ 400., 414., 420., ..., 723., 463., 345.]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ 512., 536., 635., ..., 497., 635., 336.], [ 522., 475., 570., ..., 685., 423., 297.], [ 422., 475., 492., ..., 529., 385., 301.]], ..., [[ 531., 559., 589., ..., nan, nan, nan], [ 460., 524., 567., ..., nan, nan, nan], [ 537., 499., 238., ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 413., 525., 334.], [ nan, nan, nan, ..., 368., 368., 306.], [ nan, nan, nan, ..., 527., 389., 305.]], [[ 523., 583., 547., ..., 296., 280., 318.], [ 474., 568., 573., ..., 254., 266., 338.], [ 496., 589., 247., ..., 275., 232., 277.], ..., [ 421., 442., 519., ..., 400., 558., 411.], [ 276., 431., 428., ..., 253., 415., 311.], [ 257., 294., 331., ..., 544., 410., 301.]], [[ 648., 678., 613., ..., 484., 433., 483.], [ 576., 600., 823., ..., 415., 438., 452.], [ 484., 656., 480., ..., 482., 447., 449.], ..., [ 570., 500., 701., ..., nan, nan, nan], [ 419., 550., 546., ..., nan, nan, nan], [ 367., 483., 506., ..., nan, nan, nan]]], dtype=float32)
- crs :
- EPSG:3577
- grid_mapping :
- spatial_ref
Additional information¶
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Last modified: June 2020
Compatible datacube version:
[16]:
print(datacube.__version__)
1.8.0
Tags¶
Browse all available tags on the DEA User Guide’s Tags Index
Tags: sandbox compatible, NCI compatible, load_ard, time series analysis, landsat 5, landsat 7, landsat 8, sentinel 2, cloud masking, cloud filtering, pixel quality, SLC-off, predicate function, Dask, lazy loading