Masking data 96e02fbcfe264e62919ee2b7a001325e

  • Sign up to the DEA Sandbox to run this notebook interactively from a browser

  • Compatibility: Notebook currently compatible with both the NCI and DEA Sandbox environments

  • Products used: ga_ls8c_ard_3


In the past, remote sensing researchers would reject partly cloud-affected scenes in favour of cloud-free scenes. However, multi-temporal analysis techniques increasingly make use of every quality assured pixel within a time series of observations. The capacity to automatically exclude low quality pixels (e.g. clouds, shadows or other invalid data) is essential for enabling these new remote sensing methods.

Analysis-ready satellite data from Digital Earth Australia includes pixel quality information that can be used to easily “mask” data (i.e. keep only certain pixels in an image) to obtain a time series containing only clear or cloud-free pixels.


In this notebook, we show how to mask Digital Earth Australia satellite data using boolean masks. The notebook demonstrates how to:

  1. Load in a time series of satellite data including the fmask pixel quality band

  2. Inspect the band’s flags_definition attributes

  3. Create clear and cloud-free masks and apply these to the satellite data

  4. Buffer cloudy/shadowed pixels by dilating the masks by a specified number of pixels

  5. Clean cloudy/shadowed pixels to reduce false positive cloud detection

  6. Mask out invalid nodata values and replace them with nan

Getting started

First we import relevant packages and connect to the datacube. Then we define our example area of interest and load in a time series of satellite data.

%matplotlib inline

import scipy.ndimage
import xarray
import numpy
import datacube
from import make_mask
from import mask_invalid_data
from odc.algo import mask_cleanup

import sys
sys.path.insert(1, '../Tools/')
from dea_tools.plotting import rgb

/env/lib/python3.8/site-packages/datacube/storage/ DeprecationWarning: has moved to datacube.utils.masking
  warnings.warn(" has moved to datacube.utils.masking",
/env/lib/python3.8/site-packages/geopandas/ UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.1-CAPI-1.14.2). Conversions between both will be slow.

Connect to the datacube

dc = datacube.Datacube(app="Masking_data")

Create a query and load satellite data

To demonstrate how to mask satellite data, we will load Landsat 8 surface reflectance RGB data along with a pixel quality classification band called fmask.

Note: The fmask pixel quality band contains categorical data. When loading this kind of data, it is important to use a resampling method that does not alter the values of the input cells. In the example below, we resample fmask data using the “nearest” method which preserves its original values. For all other bands with continuous values, we use “average” resampling.

# Create a reusable query
y, x = -31.492482, 115.610186
radius = 0.03
query = {"x": (x-radius, x+radius),
         "y": (y-radius, y+radius),
         "time": ("2015-07", "2015-10")}

# Load data from the Sentinel 2 satellite
data = dc.load(product="ga_ls8c_ard_3",
               measurements=["nbart_blue", "nbart_green", "nbart_red", "fmask"],
               resolution=[-30, 30],
               resampling={"fmask": "nearest", "*": "average"},

# Plot the data
rgb(data, col="time")

These images show that our area of interest partially crosses the edge of the satellite track (e.g. the first panel), and therefore some areas in the image have no observed data. By inspecting our first timestep, we see that the nodata attribute reports the value -999, and that some of the pixels have that value:

<xarray.DataArray 'nbart_red' (y: 247, x: 216)>
array([[-999, -999, -999, ..., -999, -999, -999],
       [-999, -999, -999, ..., -999, -999, -999],
       [-999, -999, -999, ..., -999, -999, -999],
       [-999, -999, -999, ...,  430,  420,  393],
       [-999, -999, -999, ...,  396,  388,  378],
       [-999, -999, -999, ...,  442,  396,  376]], dtype=int16)
    time         datetime64[ns] 2015-07-08T02:04:56.777964
  * y            (y) float64 -3.526e+06 -3.526e+06 ... -3.533e+06 -3.533e+06
  * x            (x) float64 -1.541e+06 -1.541e+06 ... -1.535e+06 -1.535e+06
    spatial_ref  int32 3577
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

We can find the classification scheme of the fmask band in its flags definition:

{'fmask': {'bits': [0, 1, 2, 3, 4, 5, 6, 7],
  'values': {'0': 'nodata',
   '1': 'valid',
   '2': 'cloud',
   '3': 'shadow',
   '4': 'snow',
   '5': 'water'},
  'description': 'Fmask'}}
data.fmask.plot(col="time", col_wrap=4)
<xarray.plot.facetgrid.FacetGrid at 0x7fd3fa60afd0>

We see that fmask also reports the nodata pixels, and along with the cloud and shadow pixels, it picked up the ocean as water too.

Cloud-free images

Creating the clear-pixel mask

We create a mask by specifying conditions that our pixels must satisfy. But we will only need the labels (e.g. fmask="valid") to create a mask.

# Create the mask based on "valid" pixels
clear_mask = make_mask(data.fmask, fmask="valid")
clear_mask.plot(col="time", col_wrap=4)
<xarray.plot.facetgrid.FacetGrid at 0x7fd3c560f3a0>

Applying the clear-pixel mask

We can now get the clear images we want.

# Apply the mask
clear = data.where(clear_mask)
rgb(clear, col="time")

Cloud-free images

If we look carefully, we can see that we have lost the ocean too. Sometimes we may instead want to create a mask using a combination of different fmask features. For example, below we create a mask that will preserve pixels that are flagged as either valid, water or snow (and mask out any cloud or cloud shadow pixels):

# Identify pixels that are either "valid", "water" or "snow"
cloud_free_mask = (
    make_mask(data.fmask, fmask="valid") |
    make_mask(data.fmask, fmask="water") |
    make_mask(data.fmask, fmask="snow")

# Apply the mask
cloud_free = data.where(cloud_free_mask)
rgb(cloud_free, col="time")

Mask dilation and cleaning

Sometimes we want our cloud masks to be more conservative and mask out more than just the pixels that fmask classified as cloud or cloud shadow. That is, sometimes we want a buffer around the cloud and the shadow. We can achieve this by dilating the mask using the mask_cleanup function from odc.algo.

Because we now want to focus on cloud and cloud shadow pixels, we first do the opposite to our previous examples and create a mask which has True values if a pixel contains either cloud or cloud shadow, and False for all others (e.g. valid, snow, water). When we plot this data, cloud and cloud shadow will appear as yellow, and other pixels as purple.

# Identify pixels that are either "cloud" or "cloud_shadow"
cloud_shadow_mask = (
    make_mask(data.fmask, fmask="cloud") |
    make_mask(data.fmask, fmask="shadow")

# Plot
cloud_shadow_mask.plot(col="time", col_wrap=4)
<xarray.plot.facetgrid.FacetGrid at 0x7fd3c512ae80>

We now apply mask_cleanup. This function allows us to modify areas of cloud and cloud shadow using image processing techniques called morphological operations. These tools can be used to expand and contract regions of our image to change the shape of specific features - in this case, clouds and cloud shadow. Four of the most useful morphological techniques include:

  • Dilation: Expand (i.e. “dilate”) True values outward, resulting in larger True features

  • Erosion: Shrink (i.e. “erode”) True values inward, resulting in smaller True features

  • Closing: First dilate, then erode True pixels. This is used to fill small or narrow False gaps inside or between True features.

  • Opening: First erode, then dilate True pixels. This is used to remove small or narrow areas of True features, but preserve larger features.

All of these operations are applied using a specific radius to control how many pixels our clouds and shadows are dilated, eroded, closed or opened. For example, we can specify that our clouds and shadows are expanded by 5 pixels in all directions as follows:

# Dilate all cloud and cloud shadow pixels by 5 pixels in all directions
cloud_shadow_buffered = mask_cleanup(mask=cloud_shadow_mask,
                                     mask_filters=[("dilation", 5)])
cloud_shadow_buffered.plot(col="time", col_wrap=4)
<xarray.plot.facetgrid.FacetGrid at 0x7fd326a5e070>

Our clouds and shadows (yellow) have now been expanded outwards (compare this to the cloud_shadow_mask data we plotted earlier).

We can now apply this dilated cloud and shadow mask to our original data (note we need to reverse our mask using ~ so that valid pixels are marked with True as in our original un-dilated example).

# Apply the mask
buffered_cloud_free = data.where(~cloud_shadow_buffered)
rgb(buffered_cloud_free, col="time")

Cloud mask data from fmask can commonly include false positive clouds over features including urban areas, bright sandy beaches and salt pans. We can see an example of this in the sixth panel, where a length of beach has been erroneously mapped as cloud.

To reduce these false positives, it can often be useful to apply a morphological “opening” operation before we dilate our clouds and shadows. This operation can remove small or narrow clouds by first “shrinking” then “expanding” our cloud and shadow pixels. To apply an opening operation with a radius of 3, we can supply an extra ("opening", 3) processing step using mask_cleanup:

# Dilate all cloud and cloud shadow pixels by 5 pixels in all directions
cloud_shadow_buffered = mask_cleanup(mask=cloud_shadow_mask,
                                     mask_filters=[("opening", 3), ("dilation", 5)])

# Apply the mask
buffered_cloud_free = data.where(~cloud_shadow_buffered)

Now when we plot our data, we can see that the sixth panel is no longer affected by false positive clouds along the bright narrow beach.

Note: Morphological operations and radiuses are application-specific; make sure to experiment with a range of options to make sure they have the effect you are looking for.

rgb(buffered_cloud_free, col="time")

Masking out invalid data

Finally, we need to set all pixels containing -999 nodata values to nan so that they do not affect subsequent analyses. This can be performed automatically using the mask_invalid_data function:

# Set invalid nodata pixels to NaN
valid_data = mask_invalid_data(buffered_cloud_free)
rgb(valid_data, col='time')

When we inspect our data, we can see that all -999 values have now been replaced with nan:

<xarray.DataArray 'nbart_red' (y: 247, x: 216)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ..., 430., 420., 393.],
       [ nan,  nan,  nan, ..., 396., 388., 378.],
       [ nan,  nan,  nan, ..., 442., 396., 376.]])
    time         datetime64[ns] 2015-07-08T02:04:56.777964
  * y            (y) float64 -3.526e+06 -3.526e+06 ... -3.533e+06 -3.533e+06
  * x            (x) float64 -1.541e+06 -1.541e+06 ... -1.535e+06 -1.535e+06
    spatial_ref  int32 3577
    units:         1
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: September 2021

Compatible datacube version:



Browse all available tags on the DEA User Guide’s Tags Index

Tags: sandbox compatible, NCI compatible, sentinel 2, dea_plotting, dea_datahandling, rgb, dilate, masking, cloud masking, buffering