Masking data 4cfc6eb8763347329e2163b713134859

Background

In the past, remote sensing researchers would reject partly cloud-affected scenes in favour of cloud-free scenes. However, multi-temporal analysis techniques increasingly make use of every quality assured pixel within a time series of observations. The capacity to automatically exclude low quality pixels (e.g. clouds, shadows or other invalid data) is essential for enabling these new remote sensing methods (GA 2016).

Analysis-ready satellite data from Digital Earth Australia includes pixel quality information that can be used to easily “mask” data (i.e. keep only certain pixels in an image) to obtain a time series containing only clear or cloud-free pixels.

Description

In this notebook, we show how to mask Digital Earth Australia satellite data using boolean masks. The notebook demonstrates how to:

  1. Load in a time series of satellite data including the fmask pixel quality band

  2. Inspect the band’s flags_definition attributes

  3. Create clear and cloud-free masks and apply these to the satellite data

  4. Buffer cloudy/shadowed pixels by dilating the masks by a specified number of pixels

  5. Mask out invalid nodata values and replace them with NaN


Getting started

First we import relevant packages and connect to the datacube. Then we define our example area of interest and load in a time series of satellite data.

[1]:
%matplotlib inline

import sys
import scipy.ndimage
import xarray
import numpy
import datacube
from datacube.storage.masking import make_mask
from datacube.storage.masking import mask_invalid_data

sys.path.append("../Scripts")
from dea_plotting import rgb
from dea_datahandling import dilate

Connect to the datacube

[2]:
dc = datacube.Datacube(app="Masking_data")

Create a query and load satellite data

To demonstrate how to mask satellite data, we will load Sentinel 2 surface reflectance RGB data along with a pixel quality classification band called fmask.

Note: The fmask pixel quality band contains categorical data. When loading this kind of data, it is important to use a resampling method that does not alter the values of the input cells. In the example below, we resample fmask data using the “nearest” method which preserves its original values. For all other bands with continuous values, we use “average” resampling.

[3]:
# Create a reusable query
query = {"x": (153.15, 153.25),
         "y": (-28.83, -28.92),
         "time": ("2018-01", "2018-02")}

# Load data from the Sentinel 2 satellite
data = dc.load(product="s2a_ard_granule",
               measurements=["nbart_blue", "nbart_green", "nbart_red", "fmask"],
               output_crs="EPSG:3577",
               resolution=[-20, 20],
               resampling={"fmask": "nearest", "*": "average"},
               **query)

# Plot the data
rgb(data, col="time")
../../_images/notebooks_Frequently_used_code_Masking_data_8_0.png

These images show that our area of interest intersects only partially with the satellite track, and therefore some areas in the image have no observed data.

The absence of satellite observation is indicated by a “nodata” value for the band, which is listed under the Attributes category of the returned xarray.DataArray.

[4]:
data.nbart_red
[4]:
<xarray.DataArray 'nbart_red' (time: 11, y: 579, x: 560)>
array([[[ 392,  431, ...,  265,  265],
        [ 423,  354, ...,  260,  259],
        ...,
        [ 544,  554, ...,  302,  305],
        [ 535,  563, ...,  310,  330]],

       [[ 749,  722, ..., -999, -999],
        [ 755,  836, ..., -999, -999],
        ...,
        [-999, -999, ..., -999, -999],
        [-999, -999, ..., -999, -999]],

       ...,

       [[ 433,  421, ..., -999, -999],
        [ 458,  404, ..., -999, -999],
        ...,
        [-999, -999, ..., -999, -999],
        [-999, -999, ..., -999, -999]],

       [[ 453,  469, ...,  891,  992],
        [ 467,  434, ..., 1024, 1076],
        ...,
        [ 722,  732, ...,  543,  565],
        [ 697,  692, ...,  586,  587]]], dtype=int16)
Coordinates:
  * time     (time) datetime64[ns] 2018-01-06T23:52:41.026000 ... 2018-02-25T23:52:41.026000
  * y        (y) float64 -3.301e+06 -3.301e+06 ... -3.312e+06 -3.312e+06
  * x        (x) float64 2.029e+06 2.029e+06 2.029e+06 ... 2.04e+06 2.04e+06
Attributes:
    units:                1
    nodata:               -999
    spectral_definition:  {'response': [0.002584, 0.034529, 0.14997, 0.464826...
    crs:                  EPSG:3577

We see that the nodata attribute reports the value -999, and we also see some of the pixels have that value.

We can find the classification scheme of the fmask band in its flags definition.

[5]:
data.fmask.attrs["flags_definition"]
[5]:
{'fmask': {'bits': [0, 1, 2, 3, 4, 5, 6, 7],
  'values': {'0': 'nodata',
   '1': 'valid',
   '2': 'cloud',
   '3': 'shadow',
   '4': 'snow',
   '5': 'water'},
  'description': 'Fmask'}}
[6]:
data.fmask.plot(col="time", col_wrap=4)
[6]:
<xarray.plot.facetgrid.FacetGrid at 0x7f9ff4de66d8>
../../_images/notebooks_Frequently_used_code_Masking_data_13_1.png

We see that fmask also reports the nodata pixels, and along with the cloud and shadow pixels, it picked up the river as water too.

Cloud-free images

Creating the clear-pixel mask

We create a mask by specifying conditions that our pixels must satisfy. But we will only need the labels (not the values) to create a mask.

[7]:
# Create the mask based on "valid" pixels
clear_mask = make_mask(data.fmask, fmask="valid")

clear_mask.plot(col="time", col_wrap=4)
[7]:
<xarray.plot.facetgrid.FacetGrid at 0x7f9ff48fbc88>
../../_images/notebooks_Frequently_used_code_Masking_data_17_1.png

Applying the clear-pixel mask

We can now get the clear images we want.

[8]:
# Apply the mask
clear = data.where(clear_mask)

rgb(clear, col="time")
../../_images/notebooks_Frequently_used_code_Masking_data_19_0.png

Cloud-free images

If we look carefully, we can see that we have lost the river too. Sometimes we may instead want to create a mask using a combination of different fmask features. For example, below we create a mask that will preserve pixels that are flagged as either valid, water or snow (and mask out any cloud or cloud shadow pixels):

[9]:
# Identify pixels that are either "valid", "water" or "snow"
cloud_free_mask = (
    make_mask(data.fmask, fmask="valid") |
    make_mask(data.fmask, fmask="water") |
    make_mask(data.fmask, fmask="snow")
)

# Apply the mask
cloud_free = data.where(cloud_free_mask)

rgb(cloud_free, col="time")
../../_images/notebooks_Frequently_used_code_Masking_data_22_0.png

Mask dilation

Sometimes we want our cloud masks to be more conservative and mask out more than just the pixels that fmask classified as cloud or cloud shadow. That is, sometimes we want a buffer around the cloud and the shadow. We can achieve this by dilating the mask using the dilate function from dea_datahandling.

[10]:
# Set the number of pixels we want to buffer cloud and cloud shadow by
dilation_in_pixels = 10

# Apply the `dilate` function to each array in `cloud_free_mask`
# and return a matching xarray.DataArray object
buffered_mask = xarray.apply_ufunc(dilate,
                                   cloud_free_mask,
                                   dilation_in_pixels,
                                   keep_attrs=True)

buffered_mask.plot(col="time", col_wrap=4)
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7f9ff40c9c18>
../../_images/notebooks_Frequently_used_code_Masking_data_25_1.png
[11]:
# Apply the mask
buffered_cloud_free = data.where(buffered_mask)

rgb(buffered_cloud_free, col="time")
../../_images/notebooks_Frequently_used_code_Masking_data_26_0.png

Masking out invalid data

We sometimes may also want to mask out only the nodata pixels and retain everything the satellite observed. In this case we are effectively just replacing our nodata value with the more conventional NaN values.

[12]:
# Set invalid nodata pixels to NaN
valid_data = mask_invalid_data(data)

rgb(valid_data, col='time')
../../_images/notebooks_Frequently_used_code_Masking_data_29_0.png
[13]:
valid_data.nbart_red
[13]:
<xarray.DataArray 'nbart_red' (time: 11, y: 579, x: 560)>
array([[[ 392.,  431., ...,  265.,  265.],
        [ 423.,  354., ...,  260.,  259.],
        ...,
        [ 544.,  554., ...,  302.,  305.],
        [ 535.,  563., ...,  310.,  330.]],

       [[ 749.,  722., ...,   nan,   nan],
        [ 755.,  836., ...,   nan,   nan],
        ...,
        [  nan,   nan, ...,   nan,   nan],
        [  nan,   nan, ...,   nan,   nan]],

       ...,

       [[ 433.,  421., ...,   nan,   nan],
        [ 458.,  404., ...,   nan,   nan],
        ...,
        [  nan,   nan, ...,   nan,   nan],
        [  nan,   nan, ...,   nan,   nan]],

       [[ 453.,  469., ...,  891.,  992.],
        [ 467.,  434., ..., 1024., 1076.],
        ...,
        [ 722.,  732., ...,  543.,  565.],
        [ 697.,  692., ...,  586.,  587.]]])
Coordinates:
  * time     (time) datetime64[ns] 2018-01-06T23:52:41.026000 ... 2018-02-25T23:52:41.026000
  * y        (y) float64 -3.301e+06 -3.301e+06 ... -3.312e+06 -3.312e+06
  * x        (x) float64 2.029e+06 2.029e+06 2.029e+06 ... 2.04e+06 2.04e+06
Attributes:
    units:                1
    nodata:               -999
    spectral_definition:  {'response': [0.002584, 0.034529, 0.14997, 0.464826...
    crs:                  EPSG:3577

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: December 2019

Compatible datacube version:

[14]:
print(datacube.__version__)
1.7

Tags

Browse all available tags on the DEA User Guide’s Tags Index

Tags: sandbox compatible, NCI compatible, sentinel 2, dea_plotting, dea_datahandling, rgb, dilate, masking, cloud masking, buffering