Loading cloud-free Sentinel 2 and Landsat from multiple satellites into one dataset

What does this notebook do?

This notebook demonstrates how to use the load_clearlandsat and load_clearsentinel2 functions to import a time series of cloud-free observations from both multiple Landsat satellites (i.e. Landsat 5, 7 and 8) and Sentinel 2 (i.e. S2A and S2B) as a single combined xarray dataset, and then plot the data as a cloud-free time series animation with each frame annotated by satellite name.

Requirements:

You need to run the following commands from the command line prior to launching jupyter notebooks from the same terminal so that the required libraries and paths are set:

module use /g/data/v10/public/modules/modulefiles

module load dea (if this notebook results in concatenation errors, try again after loading in module load dea/20180906)

module load ffmpeg

This notebook uses three external functions called load_clearlandsat, load_clearsentinel2 and animated_timeseries. These functions are available in the 10_Scripts folder of the dea-notebooks Github repository. Note that these functions have been developed by DEA users, not the DEA development team, and so are provided without warranty. If you find an error or bug in the functions, please either create an ‘Issue’ in the Github repository, or fix it yourself and create a ‘Pull’ request to contribute the updated function back into the repository (See the repository README for instructions on creating a Pull request).

Date: September 2018

Author: Robbi Bishop-Taylor

Tags: plot, animation, time_series, gif, movie, Landsat5, Landsat7, Landsat8, Sentinel2, load_clearlandsat, load_clearsentinel2, animated_timeseries, DEAPlotting, DEADataHandling, Scripts, Outputting data

Import modules and functions

[1]:
import os
import sys
import datacube
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
from datacube.utils import geometry
from datacube.utils.geometry import CRS
from datacube import Datacube

# Import external functions from dea-notebooks using relative link to Scripts
sys.path.append('../10_Scripts')
import DEAPlotting
import DEADataHandling

# Connect to datacube database
dc = datacube.Datacube(app='Time series animation')

Set up analysis variables

This sets the analysis extent, time period, output resolution and minimum proportion of cloud-free pixels used to filter the observations.

[2]:
# Set up centre of area to analyse, and a buffer in metres around this centrepoint
lat, lon, buffer_m = -32.3451254903, 142.328819458, 9000

# Set range of time to return data from both Landsat and Sentinel 2
time_range = ('2016-06-01', '2018-06-01')

# Set the output resolution for both datasets. (-25, 25) will resample Sentinel 2 to the default Landsat
# resolution of 25 m pixels, while higher values (e.g. (-100, 100)) can be useful for loading reduced
# resolution data for large areas without exceeding memory limits
resolution = (-25, 25)

# Set up minimum proportions of cloud-free pixels required to return Landsat and Sentinel2 observations.
# A proportion of 0.95 means that only observations with less than 5% cloud will be returned. Note that
# the Sentinel 2 cloud mask is often more aggressive, and sometimes requires a lower threshold
landsat_goodquality_prop, sentinel_goodquality_prop = 0.98, 0.95

Load in Landsat 5/7/8 and Sentinel 2A/2B timeseries data

Load in Landsat 5/7/8 timeseries

Load in Landsat data for the centrepoint, buffer and time range specified above. This uses the load_clearlandsat function, which loads Landsat NBAR, NBART or FC25 for multiple sensors (i.e. ls5, ls7, ls8), and returns a single xarray dataset containing only observations that contain greater than a given proportion of clear pixels. This function was designed to extract visually appealing time series of observations that are not affected by cloud, for example as an input to the animated_timeseries function from DEAPlotting.

The proportion of clear pixels is calculated by summing the pixels that are flagged as being problematic in the Landsat PQ25 layer. By default only cloudy or cloud shadowed pixels or pixels without valid data in every band are included in the calculation, but this can be customised using the mask_dict function.

If you encounter memory issues: By default, the function removes invalid -999 values by replacing them with NaN (mask_invalid_data=True). For large data extractions, it is recommended that you set both mask_pixel_quality=False and mask_invalid_data=False. Otherwise, all output variables will be coerced to float64 when NaN values are inserted into the array, potentially causing your data to use 4x as much memory. Be aware that the resulting arrays will contain invalid -999 values which should be considered in analyses.

[3]:
# Set up analysis data query using a buffer around a lat-long point (1280 x 720).
# This simply converts a lat long to Australian Albers, then creates a square analysis region
# by creating a square buffer around the point.
x, y = geometry.point(lon, lat, CRS('WGS84')).to_crs(CRS('EPSG:3577')).points[0]
query = {'x': (x - buffer_m, x + buffer_m),
         'y': (y - buffer_m, y + buffer_m),
         'time': time_range,
         'crs': 'EPSG:3577',
         'output_crs': 'EPSG:3577',
         'resolution': resolution}

# Load cloud free Landsat data for all sensors (LS5, LS7, LS8) for the above query. Setting
# `satellite_metadata=True` will return the data with a variable that gives the abbreviation
# of the satellite that made the observation
landsat_ds = DEADataHandling.load_clearlandsat(dc=dc, query=query, sensors=['ls5', 'ls7', 'ls8'],
                                               bands_of_interest=['red', 'green', 'blue'],
                                               masked_prop=landsat_goodquality_prop,
                                               satellite_metadata=True)
# Print and plot resulting data
print(landsat_ds)
landsat_ds[['red', 'green', 'blue']].isel(time=range(3, 7)).to_array().plot.imshow(robust=True, col='time')
Loading ls5 pixel quality
    Skipping ls5; no valid data for query
Ignoring SLC-off observations for ls7
Loading ls7 pixel quality
    Loading 0 filtered ls7 timesteps
Loading ls8 pixel quality
    Loading 25 filtered ls8 timesteps
Combining and sorting ls7, ls8 data
    Replacing invalid -999 values with NaN (data will be coerced to float64)
<xarray.Dataset>
Dimensions:    (time: 25, x: 721, y: 721)
Coordinates:
  * y          (y) float64 -3.556e+06 -3.556e+06 -3.556e+06 -3.556e+06 ...
  * x          (x) float64 9.542e+05 9.542e+05 9.542e+05 9.542e+05 9.543e+05 ...
  * time       (time) datetime64[ns] 2016-07-03T00:20:15 2016-08-04T00:20:25 ...
Data variables:
    red        (time, y, x) float64 1.447e+03 1.363e+03 1.422e+03 1.439e+03 ...
    green      (time, y, x) float64 877.0 823.0 845.0 856.0 830.0 746.0 ...
    blue       (time, y, x) float64 394.0 357.0 378.0 382.0 365.0 320.0 ...
    data_perc  (time) float64 0.9995 0.9998 0.9998 1.0 0.9997 1.0 1.0 0.9981 ...
    satellite  (time) object 'ls8' 'ls8' 'ls8' 'ls8' 'ls8' 'ls8' 'ls8' 'ls8' ...
Attributes:
    crs:      EPSG:3577
[3]:
<xarray.plot.facetgrid.FacetGrid at 0x7f7cbc678e10>
../../_images/notebooks_05_Temporal_analysis_LoadingCloudfreeSentinel2andLandsat_8_2.png

Load in Sentinel 2A/2B timeseries

This uses the load_clearsentinel2 function to load Sentinel 2 data for multiple sensors (i.e. s2a, s2b) and returns a single xarray dataset containing only observations that contain greater than a given proportion of clear pixels. The proportion of clear pixels is calculated by summing the pixels that are flagged as being problematic in the Sentinel pixel quality array (also named fmask). By default pixels flagged as nodata, cloud or shadow are used to calculate the number of unclear pixels, but this can be customised using the mask_values function.

Before Sentinel 2 data can be combined with Landsat, the band names need to be renamed to match Landsat band names.

[4]:
# Load cloud free Sentinel data for all sensors (S2A, S2B) for the above query. Setting
# `satellite_metadata=True` will return the data with a variable that gives the abbreviation
# of the satellite that made the observation
sentinel_ds = DEADataHandling.load_clearsentinel2(dc=dc, query=query, sensors=['s2a', 's2b'],
                                                  bands_of_interest=['nbart_red', 'nbart_green', 'nbart_blue'],
                                                  masked_prop=sentinel_goodquality_prop,
                                                  satellite_metadata=True)

# Rename bands to match Landsat to allow combining/concatenating
sentinel_ds.rename({'nbart_red': 'red', 'nbart_green': 'green', 'nbart_blue': 'blue'}, inplace=True)

# Print and plot sample of resulting data
print(sentinel_ds)
sentinel_ds[['red', 'green', 'blue']].isel(time=range(3, 7)).to_array().plot.imshow(robust=True, col='time')
Loading s2a pixel quality
    Loading 18 filtered s2a timesteps
Loading s2b pixel quality
    Loading 7 filtered s2b timesteps
Combining and sorting s2a, s2b data
    Replacing invalid -999 values with NaN (data will be coerced to float64)
<xarray.Dataset>
Dimensions:    (time: 25, x: 721, y: 721)
Coordinates:
  * y          (y) float64 -3.556e+06 -3.556e+06 -3.556e+06 -3.556e+06 ...
  * x          (x) float64 9.542e+05 9.542e+05 9.542e+05 9.542e+05 9.543e+05 ...
  * time       (time) datetime64[ns] 2016-06-28T00:31:43.654000 ...
Data variables:
    red        (time, y, x) float64 1.371e+03 1.039e+03 1.567e+03 1.309e+03 ...
    green      (time, y, x) float64 765.0 553.0 773.0 742.0 687.0 561.0 ...
    blue       (time, y, x) float64 368.0 295.0 376.0 385.0 317.0 222.0 ...
    data_perc  (time) float64 0.9965 0.9966 0.9973 0.9976 0.9738 0.9736 ...
    satellite  (time) <U3 's2a' 's2a' 's2a' 's2a' 's2a' 's2a' 's2a' 's2a' ...
Attributes:
    crs:      EPSG:3577
[4]:
<xarray.plot.facetgrid.FacetGrid at 0x7f7cd80e47b8>
../../_images/notebooks_05_Temporal_analysis_LoadingCloudfreeSentinel2andLandsat_10_2.png

Combine Landsat and Sentinel 2 into a single dataset

Here we take both Landsat and Sentinel 2 datasets, and combine them into a single dataset. This dataset is sorted so that the data can be animated in time order, regardless of what satellites made the observations. The satellite variable gives the name of the satellite that made each observation.

[5]:
# Combine into one dataset
combined_ds = xr.auto_combine([landsat_ds, sentinel_ds])

# Sort by time
combined_ds = combined_ds.sortby('time')

# Print resulting data
print(combined_ds)
<xarray.Dataset>
Dimensions:    (time: 50, x: 721, y: 721)
Coordinates:
  * y          (y) float64 -3.556e+06 -3.556e+06 -3.556e+06 -3.556e+06 ...
  * x          (x) float64 9.542e+05 9.542e+05 9.542e+05 9.542e+05 9.543e+05 ...
  * time       (time) datetime64[ns] 2016-06-28T00:31:43.654000 ...
Data variables:
    red        (time, y, x) float64 1.371e+03 1.039e+03 1.567e+03 1.309e+03 ...
    green      (time, y, x) float64 765.0 553.0 773.0 742.0 687.0 561.0 ...
    blue       (time, y, x) float64 368.0 295.0 376.0 385.0 317.0 222.0 ...
    data_perc  (time) float64 0.9965 0.9995 0.9998 0.9966 0.9998 0.9973 1.0 ...
    satellite  (time) object 's2a' 'ls8' 'ls8' 's2a' 'ls8' 's2a' 'ls8' 's2a' ...

Animate output

To demonstrate how the resulting time series derived from different satellite sensors fit together seamlessly, we can produce an RGB animation using data from both Landsat and Sentinel 2:

[6]:
# Create a prettier list of satellite names for printing over the animation
pretty_names = {'ls5': 'Landsat 5', 'ls7': 'Landsat 7', 'ls8': 'Landsat 8',
                's2a': 'Sentinel 2A', 's2b': 'Sentinel 2B'}
satellite_names = [pretty_names[i] for i in combined_ds.satellite.values]

# Produce an RGB animation that includes both Sentinel and Landsat observations, using
# the `title` parameter to print satellite names for each observation
DEAPlotting.animated_timeseries(ds=combined_ds,
                                output_path='combined_Landsat_Sentinel2.gif',
                                bands=['red', 'green', 'blue'],
                                interval=200,
                                width_pixels=500,
                                title=satellite_names,
                                percentile_stretch=[0.01, 0.99])
Generating 50 frame animation
    Exporting animation to combined_Landsat_Sentinel2.gif
../../_images/notebooks_05_Temporal_analysis_LoadingCloudfreeSentinel2andLandsat_14_1.png

Other useful features

Customise pixel quality mask

The load_clear* functions can be customised to use a different set of pixel quality values to determine if a pixel is good or bad quality. For Landsat, only pixels flagged as cloud or that are missing data in any band (contiguous=False) are used to the number of unclear pixels, but this can be customised using the mask_dict parameter. The default settings are equivelent to:

[7]:
# New query
query = {'x': (954163, 972163),
         'y': (-3573891, -3555891),
         'time': ('2001-06-01', '2005-06-01'),
         'crs': 'EPSG:3577',
         'output_crs': 'EPSG:3577',
         'resolution': (-25, 25)}

# Dictionary of values to pass to the `masking.make_mask` function
mask_dict = {'cloud_acca': 'no_cloud',
             'cloud_fmask': 'no_cloud',
             'contiguous': True}

DEADataHandling.load_clearlandsat(dc=dc, query=query, sensors=['ls5', 'ls7', 'ls8'],
                                  bands_of_interest=['red', 'green', 'blue'],
                                  masked_prop=0.999,
                                  mask_dict=mask_dict,
                                  satellite_metadata=True)
Loading ls5 pixel quality
    Loading 27 filtered ls5 timesteps
Ignoring SLC-off observations for ls7
Loading ls7 pixel quality
    Loading 12 filtered ls7 timesteps
Loading ls8 pixel quality
    Skipping ls8; no valid data for query
Combining and sorting ls5, ls7 data
    Replacing invalid -999 values with NaN (data will be coerced to float64)
[7]:
<xarray.Dataset>
Dimensions:    (time: 39, x: 721, y: 721)
Coordinates:
  * y          (y) float64 -3.556e+06 -3.556e+06 -3.556e+06 -3.556e+06 ...
  * x          (x) float64 9.542e+05 9.542e+05 9.542e+05 9.542e+05 9.543e+05 ...
  * time       (time) datetime64[ns] 2001-07-02T00:09:44.500000 ...
Data variables:
    red        (time, y, x) float64 2.018e+03 1.942e+03 1.912e+03 1.827e+03 ...
    green      (time, y, x) float64 1.081e+03 1.084e+03 1.09e+03 1.088e+03 ...
    blue       (time, y, x) float64 550.0 552.0 510.0 464.0 459.0 455.0 ...
    data_perc  (time) float64 1.0 1.0 0.9998 1.0 0.9997 0.9995 0.9996 0.9997 ...
    satellite  (time) <U3 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' ...
Attributes:
    crs:      EPSG:3577

But we can use a more restrictive mask that also treats cloud shadow and saturated pixels as poor quality. Note that the code below returns a total of 35 observations compared to the 39 above:

[8]:
# Dictionary of values to pass to the `masking.make_mask` function
mask_dict = {'cloud_acca': 'no_cloud',
            'cloud_fmask': 'no_cloud',
            'cloud_shadow_acca': 'no_cloud_shadow',
            'cloud_shadow_fmask': 'no_cloud_shadow',
            'blue_saturated': False,
            'green_saturated': False,
            'red_saturated': False,
            'nir_saturated': False,
            'swir1_saturated': False,
            'swir2_saturated': False,
            'contiguous': True}

DEADataHandling.load_clearlandsat(dc=dc, query=query, sensors=['ls5', 'ls7', 'ls8'],
                                  bands_of_interest=['red', 'green', 'blue'],
                                  masked_prop=0.999,
                                  mask_dict=mask_dict,
                                  satellite_metadata=True)
Loading ls5 pixel quality
    Loading 26 filtered ls5 timesteps
Ignoring SLC-off observations for ls7
Loading ls7 pixel quality
    Loading 9 filtered ls7 timesteps
Loading ls8 pixel quality
    Skipping ls8; no valid data for query
Combining and sorting ls5, ls7 data
    Replacing invalid -999 values with NaN (data will be coerced to float64)
[8]:
<xarray.Dataset>
Dimensions:    (time: 35, x: 721, y: 721)
Coordinates:
  * y          (y) float64 -3.556e+06 -3.556e+06 -3.556e+06 -3.556e+06 ...
  * x          (x) float64 9.542e+05 9.542e+05 9.542e+05 9.542e+05 9.543e+05 ...
  * time       (time) datetime64[ns] 2001-07-02T00:09:44.500000 ...
Data variables:
    red        (time, y, x) float64 2.018e+03 1.942e+03 1.912e+03 1.827e+03 ...
    green      (time, y, x) float64 1.081e+03 1.084e+03 1.09e+03 1.088e+03 ...
    blue       (time, y, x) float64 550.0 552.0 510.0 464.0 459.0 455.0 ...
    data_perc  (time) float64 1.0 0.9999 0.9998 1.0 0.9995 0.9995 0.9999 ...
    satellite  (time) <U3 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' 'ls7' ...
Attributes:
    crs:      EPSG:3577

PQ masking

Depending on the value set for masked_prop, the resulting observations will still contain a small proportion of poor quality pixels. We can mask out these remaining pixels and set them to NaN using mask_pixel_quality=True. This applies the PQ mask that was constructed from the values declared using mask_dict. First, view how the resulting layers appear without mask_pixel_quality=False (the default):

[9]:
# New query
query = {'x': (954163, 972163),
         'y': (-3573891, -3555891),
         'time': ('2010-06-01', '2011-06-01'),
         'crs': 'EPSG:3577',
         'output_crs': 'EPSG:3577',
         'resolution': (-25, 25)}

# With mask_pixel_quality=False
landsat_ds = DEADataHandling.load_clearlandsat(dc=dc, query=query, sensors=['ls5', 'ls7', 'ls8'],
                                               bands_of_interest=['red', 'green', 'blue'],
                                               masked_prop=0.7,
                                               mask_pixel_quality=False)

# Print and plot sample of resulting data
print(landsat_ds)
landsat_ds[['red', 'green', 'blue']].isel(time=1).to_array().plot.imshow(robust=True)


Loading ls5 pixel quality
    Loading 11 filtered ls5 timesteps
Ignoring SLC-off observations for ls7
Loading ls7 pixel quality
    Loading 0 filtered ls7 timesteps
Loading ls8 pixel quality
    Skipping ls8; no valid data for query
Combining and sorting ls5, ls7 data
    Replacing invalid -999 values with NaN (data will be coerced to float64)
<xarray.Dataset>
Dimensions:    (time: 11, x: 721, y: 721)
Coordinates:
  * y          (y) float64 -3.556e+06 -3.556e+06 -3.556e+06 -3.556e+06 ...
  * x          (x) float64 9.542e+05 9.542e+05 9.542e+05 9.542e+05 9.543e+05 ...
  * time       (time) datetime64[ns] 2010-09-05T00:10:38.500000 ...
Data variables:
    red        (time, y, x) float64 1.938e+03 1.82e+03 1.915e+03 1.794e+03 ...
    green      (time, y, x) float64 1.132e+03 1.083e+03 1.09e+03 1.144e+03 ...
    blue       (time, y, x) float64 544.0 571.0 574.0 575.0 521.0 492.0 ...
    data_perc  (time) float64 0.9996 0.7331 0.9245 0.8284 1.0 0.9062 0.9971 ...
Attributes:
    crs:      EPSG:3577
[9]:
<matplotlib.image.AxesImage at 0x7f7cb39f6780>
../../_images/notebooks_05_Temporal_analysis_LoadingCloudfreeSentinel2andLandsat_20_2.png

Now use mask_pixel_quality=True:

[10]:
# With mask_pixel_quality=True
landsat_ds = DEADataHandling.load_clearlandsat(dc=dc, query=query, sensors=['ls5', 'ls7', 'ls8'],
                                               bands_of_interest=['red', 'green', 'blue'],
                                               masked_prop=0.7,
                                               mask_pixel_quality=True)

# Print and plot sample of resulting data
print(landsat_ds)
landsat_ds[['red', 'green', 'blue']].isel(time=1).to_array().plot.imshow(robust=True)
Loading ls5 pixel quality
    Loading 11 filtered ls5 timesteps
Ignoring SLC-off observations for ls7
Loading ls7 pixel quality
    Loading 0 filtered ls7 timesteps
Loading ls8 pixel quality
    Skipping ls8; no valid data for query
Combining and sorting ls5, ls7 data
    Replacing invalid -999 values with NaN (data will be coerced to float64)
<xarray.Dataset>
Dimensions:    (time: 11, x: 721, y: 721)
Coordinates:
  * y          (y) float64 -3.556e+06 -3.556e+06 -3.556e+06 -3.556e+06 ...
  * x          (x) float64 9.542e+05 9.542e+05 9.542e+05 9.542e+05 9.543e+05 ...
  * time       (time) datetime64[ns] 2010-09-05T00:10:38.500000 ...
Data variables:
    red        (time, y, x) float64 1.938e+03 1.82e+03 1.915e+03 1.794e+03 ...
    green      (time, y, x) float64 1.132e+03 1.083e+03 1.09e+03 1.144e+03 ...
    blue       (time, y, x) float64 544.0 571.0 574.0 575.0 521.0 492.0 ...
    data_perc  (time, y, x) float64 0.9996 0.9996 0.9996 0.9996 0.9996 ...
Attributes:
    crs:      EPSG:3577
[10]:
<matplotlib.image.AxesImage at 0x7f7cbc6d5be0>
../../_images/notebooks_05_Temporal_analysis_LoadingCloudfreeSentinel2andLandsat_22_2.png
[ ]: