Masking data
¶
**Sign up to the DEA Sandbox** to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
NCI
andDEA Sandbox
environmentsProducts used: s2a_ard_granule
Background¶
In the past, remote sensing researchers would reject partly cloud-affected scenes in favour of cloud-free scenes. However, multi-temporal analysis techniques increasingly make use of every quality assured pixel within a time series of observations. The capacity to automatically exclude low quality pixels (e.g. clouds, shadows or other invalid data) is essential for enabling these new remote sensing methods (GA 2016).
Analysis-ready satellite data from Digital Earth Australia includes pixel quality information that can be used to easily “mask” data (i.e. keep only certain pixels in an image) to obtain a time series containing only clear or cloud-free pixels.
Description¶
In this notebook, we show how to mask Digital Earth Australia satellite data using boolean masks. The notebook demonstrates how to:
Load in a time series of satellite data including the
fmask
pixel quality bandInspect the band’s
flags_definition
attributesCreate clear and cloud-free masks and apply these to the satellite data
Buffer cloudy/shadowed pixels by dilating the masks by a specified number of pixels
Mask out invalid nodata values and replace them with
NaN
Getting started¶
First we import relevant packages and connect to the datacube. Then we define our example area of interest and load in a time series of satellite data.
[1]:
%matplotlib inline
import sys
import scipy.ndimage
import xarray
import numpy
import datacube
from datacube.storage.masking import make_mask
from datacube.storage.masking import mask_invalid_data
sys.path.append("../Scripts")
from dea_plotting import rgb
from dea_datahandling import dilate
Connect to the datacube¶
[2]:
dc = datacube.Datacube(app="Masking_data")
Create a query and load satellite data¶
To demonstrate how to mask satellite data, we will load Sentinel 2 surface reflectance RGB data along with a pixel quality classification band called fmask
.
Note: The
fmask
pixel quality band contains categorical data. When loading this kind of data, it is important to use a resampling method that does not alter the values of the input cells. In the example below, we resamplefmask
data using the “nearest” method which preserves its original values. For all other bands with continuous values, we use “average” resampling.
[3]:
# Create a reusable query
query = {"x": (153.15, 153.25),
"y": (-28.83, -28.92),
"time": ("2018-01", "2018-02")}
# Load data from the Sentinel 2 satellite
data = dc.load(product="s2a_ard_granule",
measurements=["nbart_blue", "nbart_green", "nbart_red", "fmask"],
output_crs="EPSG:3577",
resolution=[-20, 20],
resampling={"fmask": "nearest", "*": "average"},
**query)
# Plot the data
rgb(data, col="time")

These images show that our area of interest intersects only partially with the satellite track, and therefore some areas in the image have no observed data.
The absence of satellite observation is indicated by a “nodata” value for the band, which is listed under the Attributes category of the returned xarray.DataArray
.
[4]:
data.nbart_red
[4]:
<xarray.DataArray 'nbart_red' (time: 11, y: 579, x: 560)>
array([[[ 392, 431, ..., 265, 265],
[ 423, 354, ..., 260, 259],
...,
[ 544, 554, ..., 302, 305],
[ 535, 563, ..., 310, 330]],
[[ 749, 722, ..., -999, -999],
[ 755, 836, ..., -999, -999],
...,
[-999, -999, ..., -999, -999],
[-999, -999, ..., -999, -999]],
...,
[[ 433, 421, ..., -999, -999],
[ 458, 404, ..., -999, -999],
...,
[-999, -999, ..., -999, -999],
[-999, -999, ..., -999, -999]],
[[ 453, 469, ..., 891, 992],
[ 467, 434, ..., 1024, 1076],
...,
[ 722, 732, ..., 543, 565],
[ 697, 692, ..., 586, 587]]], dtype=int16)
Coordinates:
* time (time) datetime64[ns] 2018-01-06T23:52:41.026000 ... 2018-02-25T23:52:41.026000
* y (y) float64 -3.301e+06 -3.301e+06 ... -3.312e+06 -3.312e+06
* x (x) float64 2.029e+06 2.029e+06 2.029e+06 ... 2.04e+06 2.04e+06
Attributes:
units: 1
nodata: -999
spectral_definition: {'response': [0.002584, 0.034529, 0.14997, 0.464826...
crs: EPSG:3577
We see that the nodata
attribute reports the value -999, and we also see some of the pixels have that value.
We can find the classification scheme of the fmask
band in its flags definition.
[5]:
data.fmask.attrs["flags_definition"]
[5]:
{'fmask': {'bits': [0, 1, 2, 3, 4, 5, 6, 7],
'values': {'0': 'nodata',
'1': 'valid',
'2': 'cloud',
'3': 'shadow',
'4': 'snow',
'5': 'water'},
'description': 'Fmask'}}
[6]:
data.fmask.plot(col="time", col_wrap=4)
[6]:
<xarray.plot.facetgrid.FacetGrid at 0x7f9ff4de66d8>

We see that fmask
also reports the nodata
pixels, and along with the cloud and shadow pixels, it picked up the river as water too.
Cloud-free images¶
Creating the clear-pixel mask¶
We create a mask by specifying conditions that our pixels must satisfy. But we will only need the labels (not the values) to create a mask.
[7]:
# Create the mask based on "valid" pixels
clear_mask = make_mask(data.fmask, fmask="valid")
clear_mask.plot(col="time", col_wrap=4)
[7]:
<xarray.plot.facetgrid.FacetGrid at 0x7f9ff48fbc88>

Applying the clear-pixel mask¶
We can now get the clear images we want.
[8]:
# Apply the mask
clear = data.where(clear_mask)
rgb(clear, col="time")

Cloud-free images¶
If we look carefully, we can see that we have lost the river too. Sometimes we may instead want to create a mask using a combination of different fmask
features. For example, below we create a mask that will preserve pixels that are flagged as either valid, water or snow (and mask out any cloud or cloud shadow pixels):
[9]:
# Identify pixels that are either "valid", "water" or "snow"
cloud_free_mask = (
make_mask(data.fmask, fmask="valid") |
make_mask(data.fmask, fmask="water") |
make_mask(data.fmask, fmask="snow")
)
# Apply the mask
cloud_free = data.where(cloud_free_mask)
rgb(cloud_free, col="time")

Mask dilation¶
Sometimes we want our cloud masks to be more conservative and mask out more than just the pixels that fmask
classified as cloud or cloud shadow. That is, sometimes we want a buffer around the cloud and the shadow. We can achieve this by dilating the mask using the dilate
function from dea_datahandling
.
[10]:
# Set the number of pixels we want to buffer cloud and cloud shadow by
dilation_in_pixels = 10
# Apply the `dilate` function to each array in `cloud_free_mask`
# and return a matching xarray.DataArray object
buffered_mask = xarray.apply_ufunc(dilate,
cloud_free_mask,
dilation_in_pixels,
keep_attrs=True)
buffered_mask.plot(col="time", col_wrap=4)
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7f9ff40c9c18>

[11]:
# Apply the mask
buffered_cloud_free = data.where(buffered_mask)
rgb(buffered_cloud_free, col="time")

Masking out invalid data¶
We sometimes may also want to mask out only the nodata pixels and retain everything the satellite observed. In this case we are effectively just replacing our nodata
value with the more conventional NaN
values.
[12]:
# Set invalid nodata pixels to NaN
valid_data = mask_invalid_data(data)
rgb(valid_data, col='time')

[13]:
valid_data.nbart_red
[13]:
<xarray.DataArray 'nbart_red' (time: 11, y: 579, x: 560)>
array([[[ 392., 431., ..., 265., 265.],
[ 423., 354., ..., 260., 259.],
...,
[ 544., 554., ..., 302., 305.],
[ 535., 563., ..., 310., 330.]],
[[ 749., 722., ..., nan, nan],
[ 755., 836., ..., nan, nan],
...,
[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
...,
[[ 433., 421., ..., nan, nan],
[ 458., 404., ..., nan, nan],
...,
[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
[[ 453., 469., ..., 891., 992.],
[ 467., 434., ..., 1024., 1076.],
...,
[ 722., 732., ..., 543., 565.],
[ 697., 692., ..., 586., 587.]]])
Coordinates:
* time (time) datetime64[ns] 2018-01-06T23:52:41.026000 ... 2018-02-25T23:52:41.026000
* y (y) float64 -3.301e+06 -3.301e+06 ... -3.312e+06 -3.312e+06
* x (x) float64 2.029e+06 2.029e+06 2.029e+06 ... 2.04e+06 2.04e+06
Attributes:
units: 1
nodata: -999
spectral_definition: {'response': [0.002584, 0.034529, 0.14997, 0.464826...
crs: EPSG:3577
Additional information¶
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Last modified: December 2019
Compatible datacube version:
[14]:
print(datacube.__version__)
1.7
Tags¶
Browse all available tags on the DEA User Guide’s Tags Index
Tags: sandbox compatible, NCI compatible, sentinel 2, dea_plotting, dea_datahandling, rgb, dilate, masking, cloud masking, buffering