• Sign up to the DEA Sandbox to run this notebook interactively from a browser

• Compatibility: Notebook currently compatible with both the NCI and DEA Sandbox environments

• Products used: s2a_ard_granule

## Background¶

In the past, remote sensing researchers would reject partly cloud-affected scenes in favour of cloud-free scenes. However, multi-temporal analysis techniques increasingly make use of every quality assured pixel within a time series of observations. The capacity to automatically exclude low quality pixels (e.g. clouds, shadows or other invalid data) is essential for enabling these new remote sensing methods (GA 2016).

Analysis-ready satellite data from Digital Earth Australia includes pixel quality information that can be used to easily “mask” data (i.e. keep only certain pixels in an image) to obtain a time series containing only clear or cloud-free pixels.

## Description¶

In this notebook, we show how to mask Digital Earth Australia satellite data using boolean masks. The notebook demonstrates how to:

1. Load in a time series of satellite data including the fmask pixel quality band

2. Inspect the band’s flags_definition attributes

3. Create clear and cloud-free masks and apply these to the satellite data

4. Buffer cloudy/shadowed pixels by dilating the masks by a specified number of pixels

5. Mask out invalid nodata values and replace them with NaN

## Getting started¶

First we import relevant packages and connect to the datacube. Then we define our example area of interest and load in a time series of satellite data.

[1]:

%matplotlib inline

import sys
import scipy.ndimage
import xarray
import numpy
import datacube

sys.path.append("../Scripts")
from dea_plotting import rgb
from dea_datahandling import dilate



### Connect to the datacube¶

[2]:

dc = datacube.Datacube(app="Masking_data")


## Create a query and load satellite data¶

To demonstrate how to mask satellite data, we will load Sentinel 2 surface reflectance RGB data along with a pixel quality classification band called fmask.

Note: The fmask pixel quality band contains categorical data. When loading this kind of data, it is important to use a resampling method that does not alter the values of the input cells. In the example below, we resample fmask data using the “nearest” method which preserves its original values. For all other bands with continuous values, we use “average” resampling.

[3]:

# Create a reusable query
query = {"x": (153.15, 153.25),
"y": (-28.83, -28.92),
"time": ("2018-01", "2018-02")}

# Load data from the Sentinel 2 satellite
output_crs="EPSG:3577",
resolution=[-20, 20],
**query)

# Plot the data
rgb(data, col="time")


These images show that our area of interest intersects only partially with the satellite track, and therefore some areas in the image have no observed data.

The absence of satellite observation is indicated by a “nodata” value for the band, which is listed under the Attributes category of the returned xarray.DataArray.

[4]:

data.nbart_red

[4]:

<xarray.DataArray 'nbart_red' (time: 11, y: 579, x: 560)>
array([[[ 392,  431, ...,  265,  265],
[ 423,  354, ...,  260,  259],
...,
[ 544,  554, ...,  302,  305],
[ 535,  563, ...,  310,  330]],

[[ 749,  722, ..., -999, -999],
[ 755,  836, ..., -999, -999],
...,
[-999, -999, ..., -999, -999],
[-999, -999, ..., -999, -999]],

...,

[[ 433,  421, ..., -999, -999],
[ 458,  404, ..., -999, -999],
...,
[-999, -999, ..., -999, -999],
[-999, -999, ..., -999, -999]],

[[ 453,  469, ...,  891,  992],
[ 467,  434, ..., 1024, 1076],
...,
[ 722,  732, ...,  543,  565],
[ 697,  692, ...,  586,  587]]], dtype=int16)
Coordinates:
* time     (time) datetime64[ns] 2018-01-06T23:52:41.026000 ... 2018-02-25T23:52:41.026000
* y        (y) float64 -3.301e+06 -3.301e+06 ... -3.312e+06 -3.312e+06
* x        (x) float64 2.029e+06 2.029e+06 2.029e+06 ... 2.04e+06 2.04e+06
Attributes:
units:                1
nodata:               -999
spectral_definition:  {'response': [0.002584, 0.034529, 0.14997, 0.464826...
crs:                  EPSG:3577


We see that the nodata attribute reports the value -999, and we also see some of the pixels have that value.

We can find the classification scheme of the fmask band in its flags definition.

[5]:

data.fmask.attrs["flags_definition"]

[5]:

{'fmask': {'bits': [0, 1, 2, 3, 4, 5, 6, 7],
'values': {'0': 'nodata',
'1': 'valid',
'2': 'cloud',
'4': 'snow',
'5': 'water'},

[6]:

data.fmask.plot(col="time", col_wrap=4)

[6]:

<xarray.plot.facetgrid.FacetGrid at 0x7f9ff4de66d8>


We see that fmask also reports the nodata pixels, and along with the cloud and shadow pixels, it picked up the river as water too.

## Cloud-free images¶

We create a mask by specifying conditions that our pixels must satisfy. But we will only need the labels (not the values) to create a mask.

[7]:

# Create the mask based on "valid" pixels


[7]:

<xarray.plot.facetgrid.FacetGrid at 0x7f9ff48fbc88>


We can now get the clear images we want.

[8]:

# Apply the mask

rgb(clear, col="time")


### Cloud-free images¶

If we look carefully, we can see that we have lost the river too. Sometimes we may instead want to create a mask using a combination of different fmask features. For example, below we create a mask that will preserve pixels that are flagged as either valid, water or snow (and mask out any cloud or cloud shadow pixels):

[9]:

# Identify pixels that are either "valid", "water" or "snow"
)

rgb(cloud_free, col="time")


Sometimes we want our cloud masks to be more conservative and mask out more than just the pixels that fmask classified as cloud or cloud shadow. That is, sometimes we want a buffer around the cloud and the shadow. We can achieve this by dilating the mask using the dilate function from dea_datahandling.

[10]:

# Set the number of pixels we want to buffer cloud and cloud shadow by
dilation_in_pixels = 10

# Apply the dilate function to each array in cloud_free_mask
# and return a matching xarray.DataArray object
dilation_in_pixels,
keep_attrs=True)


[10]:

<xarray.plot.facetgrid.FacetGrid at 0x7f9ff40c9c18>

[11]:

# Apply the mask

rgb(buffered_cloud_free, col="time")


We sometimes may also want to mask out only the nodata pixels and retain everything the satellite observed. In this case we are effectively just replacing our nodata value with the more conventional NaN values.

[12]:

# Set invalid nodata pixels to NaN

rgb(valid_data, col='time')

[13]:

valid_data.nbart_red

[13]:

<xarray.DataArray 'nbart_red' (time: 11, y: 579, x: 560)>
array([[[ 392.,  431., ...,  265.,  265.],
[ 423.,  354., ...,  260.,  259.],
...,
[ 544.,  554., ...,  302.,  305.],
[ 535.,  563., ...,  310.,  330.]],

[[ 749.,  722., ...,   nan,   nan],
[ 755.,  836., ...,   nan,   nan],
...,
[  nan,   nan, ...,   nan,   nan],
[  nan,   nan, ...,   nan,   nan]],

...,

[[ 433.,  421., ...,   nan,   nan],
[ 458.,  404., ...,   nan,   nan],
...,
[  nan,   nan, ...,   nan,   nan],
[  nan,   nan, ...,   nan,   nan]],

[[ 453.,  469., ...,  891.,  992.],
[ 467.,  434., ..., 1024., 1076.],
...,
[ 722.,  732., ...,  543.,  565.],
[ 697.,  692., ...,  586.,  587.]]])
Coordinates:
* time     (time) datetime64[ns] 2018-01-06T23:52:41.026000 ... 2018-02-25T23:52:41.026000
* y        (y) float64 -3.301e+06 -3.301e+06 ... -3.312e+06 -3.312e+06
* x        (x) float64 2.029e+06 2.029e+06 2.029e+06 ... 2.04e+06 2.04e+06
Attributes:
units:                1
nodata:               -999
spectral_definition:  {'response': [0.002584, 0.034529, 0.14997, 0.464826...
crs:                  EPSG:3577


Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[14]:

print(datacube.__version__)

1.7


## Tags¶

Browse all available tags on the DEA User Guide’s Tags Index

Tags: sandbox compatible, NCI compatible, sentinel 2, dea_plotting, dea_datahandling, rgb, dilate, masking, cloud masking, buffering