Introduction to DEA Landsat Surface Reflectance (Geoscience Australia Landsat Collection 3) 7853097cfdff419db8db20f325b4b570

Background

The United States Geological Survey’s (USGS) Landsat satellite program has been capturing images of the Australian continent for more than 30 years. These data are highly useful for land and coastal mapping studies.

In particular, the light reflected from the Earth’s surface (surface reflectance) is important for monitoring environmental resources (such as agricultural production and mining activities) over time.

We need to make accurate comparisons of imagery acquired at different times, seasons and geographic locations. However, inconsistencies can arise due to variations in atmospheric conditions, sun position, sensor view angle, surface slope and surface aspect. These inconsistencies need to be reduced or removed to ensure the data are consistent and can be compared over time.

What this product offers

DEA’s Landsat Surface Reflectance products take Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and Landsat 8 Operational Land Imager (OLI) imagery captured over the Australian continent and corrects for inconsistencies across land and coastal fringes. The result is accurate and standardised surface reflectance data, which is instrumental in identifying and quantifying environmental change.

DEA’s Landsat Surface Reflectance products form a single, cohesive Analysis Ready Data (ARD) package, which allows you to analyse surface reflectance data as is without the need to apply additional corrections.

It contains three sub-products that provide corrections or attribution information:

The resolution is a 30 m grid based on the USGS Landsat Collection 1 archive.

Applications

  • The development of derivative products to monitor land, inland waterways and coastal features, such as:

  • urban growth

  • coastal habitats

  • mining activities

  • agricultural activity (e.g. pastoral, irrigated cropping, rain-fed cropping)

  • water extent

  • The development of refined information products, such as:

  • areal units of detected surface water

  • areal units of deforestation

  • yield predictions of agricultural parcels

  • Compliance surveys

  • Emergency management

Publications

  • Li, F., Jupp, D. L. B., Reddy, S., Lymburner, L., Mueller, N., Tan, P., & Islam, A. (2010). An evaluation of the use of atmospheric and BRDF correction to standardize Landsat data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 3(3), 257–270. https://doi.org/10.1109/JSTARS.2010.2042281

  • Li, F., Jupp, D. L. B., Thankappan, M., Lymburner, L., Mueller, N., Lewis, A., & Held, A. (2012). A physics-based atmospheric and BRDF correction for Landsat data over mountainous terrain. Remote Sensing of Environment, 124, 756–770. https://doi.org/10.1016/j.rse.2012.06.018

    Note: For more technical information about the DEA Landsat Surface Reflectance products, visit the official Geoscience Australia DEA Landsat Surface Reflectance product listings.

Description

This notebook will demonstrate how to load data from the DEA Landsat Surface Reflectance products using the Digital Earth Australia datacube. Topics covered include:

  1. Inspecting the Landsat products and measurements available in the datacube

  2. Loading Landsat data for a specific location and time

  3. Plotting Landsat data in true and false colour

  4. Applying a cloud mask using the oa_fmask band

  5. Advanced: Loading Landsat data in its native projection and resolution

  6. Advanced: Filtering Landsat data by product metadata


Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

Import Python packages that are used for the analysis.

[1]:
import sys
import datacube
import matplotlib.pyplot as plt
from datacube.utils import masking

sys.path.append('../Scripts')
from dea_plotting import rgb
from dea_datahandling import mostcommon_crs

Connect to the datacube

Connect to the datacube so we can access DEA data.

[2]:
dc = datacube.Datacube(app='DEA_Landsat_Surface_Reflectance')

Available products and measurements

List products

We can use datacube’s list_products functionality to inspect the DEA Landsat Surface Reflectance products that are available in the datacube. The table below shows the product names that we will use to load the data, a brief description of the data, and the satellite instrument that acquired the data.

[3]:
# List DEA Landsat Surface Reflectance products available in DEA
dc_products = dc.list_products()
display_columns = ['name', 'description', 'instrument']
dc_products[dc_products.description.str.contains(
    'Analysis Ready Data Collection 3').fillna(
        False)][display_columns].set_index('name')
[3]:
description instrument
name
ga_ls5t_ard_3 Geoscience Australia Landsat 5 Thematic Mapper... TM
ga_ls7e_ard_3 Geoscience Australia Landsat 7 Enhanced Themat... ETM
ga_ls8c_ard_3 Geoscience Australia Landsat 8 Operational Lan... OLI_TIRS

List measurements

We can further inspect the data available for each DEA Landsat Surface Reflectance product using datacube’s list_measurements functionality. The table below lists each of the measurements available in the data which have prefixes nbart_, nbar_ and oa_ that represent three main “sub-products” that provide corrections or attribution information for each Landsat product.

[4]:
dc_measurements = dc.list_measurements()
dc_measurements.loc['ga_ls8c_ard_3']
[4]:
name dtype units nodata aliases flags_definition spectral_definition
measurement
nbart_coastal_aerosol nbart_coastal_aerosol int16 1 -999 [nbart_band01, coastal_aerosol] NaN NaN
nbart_blue nbart_blue int16 1 -999 [nbart_band02, blue] NaN NaN
nbart_green nbart_green int16 1 -999 [nbart_band03, green] NaN NaN
nbart_red nbart_red int16 1 -999 [nbart_band04, red] NaN NaN
nbart_nir nbart_nir int16 1 -999 [nbart_band05, nir] NaN NaN
nbart_swir_1 nbart_swir_1 int16 1 -999 [nbart_band06, swir_1, swir1] NaN NaN
nbart_swir_2 nbart_swir_2 int16 1 -999 [nbart_band07, swir_2, swir2] NaN NaN
nbart_panchromatic nbart_panchromatic int16 1 -999 [nbart_band08, panchromatic] NaN NaN
oa_fmask oa_fmask uint8 1 0 [fmask] {'fmask': {'bits': [0, 1, 2, 3, 4, 5, 6, 7], '... NaN
oa_nbart_contiguity oa_nbart_contiguity uint8 1 255 [nbart_contiguity] {'contiguous': {'bits': [0], 'values': {'0': F... NaN
oa_azimuthal_exiting oa_azimuthal_exiting float32 1 NaN [azimuthal_exiting] NaN NaN
oa_azimuthal_incident oa_azimuthal_incident float32 1 NaN [azimuthal_incident] NaN NaN
oa_combined_terrain_shadow oa_combined_terrain_shadow uint8 1 255 [combined_terrain_shadow] NaN NaN
oa_exiting_angle oa_exiting_angle float32 1 NaN [exiting_angle] NaN NaN
oa_incident_angle oa_incident_angle float32 1 NaN [incident_angle] NaN NaN
oa_relative_azimuth oa_relative_azimuth float32 1 NaN [relative_azimuth] NaN NaN
oa_relative_slope oa_relative_slope float32 1 NaN [relative_slope] NaN NaN
oa_satellite_azimuth oa_satellite_azimuth float32 1 NaN [satellite_azimuth] NaN NaN
oa_satellite_view oa_satellite_view float32 1 NaN [satellite_view] NaN NaN
oa_solar_azimuth oa_solar_azimuth float32 1 NaN [solar_azimuth] NaN NaN
oa_solar_zenith oa_solar_zenith float32 1 NaN [solar_zenith] NaN NaN
oa_time_delta oa_time_delta float32 1 NaN [time_delta] NaN NaN

Surface Reflectance NBAR

Radiance data collected by Landsat 8 OLI-TIRS sensors can be affected by atmospheric conditions, sun position, sensor view angle, surface slope and surface aspect. These need to be reduced or removed to ensure the data can be compared consistently over time and space to identify and quantify environmental change. NBAR data (i.e. measurements with the nbar_ prefix above) takes Landsat imagery captured over the Australian continent and corrects the inconsistencies across land and coastal fringes using Nadir corrected Bi-directional reflectance distribution function Adjusted Reflectance (NBAR).

Note: NBAR data is currently only available in the NCI environment, not the DEA Sandbox.

Surface Reflectance NBART (with terrain illumination correction)

NBART data (i.e. measurements with the nbart_ prefix above) are also processed using NBAR, but have an additional terrain illumination correction applied using a Digital Surface Model (DSM) to correct for varying terrain (see Figure 1). NBART is typically the default choice for most analyses as removing terrain illumination and shadows allows changes in the landscape to be compared more consistently across time.

Comparison between NBAR and NBART

Figure 1: The animation above demonstrates how the NBART correction results in a significantly more two-dimensional looking image that is less affected by terrain illumination and shadow. Black pixels in the NBART image represent areas of deep terrain shadow that can’t be corrected as they’re determined not to be viewable by either the sun or the satellite. These are represented by -999 nodata values in the data. For more details about the difference between NBAR and NBART data, refer to the Introduction to Digital Earth Australia notebook, and the official DEA Landsat Surface Reflectance product listings.

Surface Reflectance Observation Attributes (OA)

The surface reflectance data produced by NBAR and NBART require accurate and reliable data provenance and information about the satellite data’s origins, derivations, methodology and processes. This allows the data to be reproducible and can increase the reliability of derivative applications. Attribution labels such as the location of cloud and cloud shadow pixels can be used to mask out these particular features from the surface reflectance analysis, or used as training data for machine learning algorithms. DEA’s Landsat surface reflectance data provides this information as “Observation Attribute” measurements with the prefix oa_. In the following Cloud masking section, we will demonstrate how to use OA data to create clear, cloud-free Landsat imagery.

The table produced by .list_measurements() also contains important information about the data type, nodata values and aliases available for each measurement contained in the product. Aliases (e.g. blue) can be used instead of the official measurement name (e.g. nbart_blue) when loading data (see next step).

Loading data

Now that we know what products and measurements are available for the products, we can load data from the datacube using dc.load.

In the example below, we will load data from Landsat 8 for Canberra from August 2018. We will load data from six spectral satellite bands, as well as cloud masking data ('oa_fmask'). By specifying output_crs='EPSG:3577' and resolution=(-30, 30), we request that datacube reproject our data to the Australian Albers coordinate reference system (CRS), with 30 x 30 m pixels. Finally, group_by='solar_day' ensures that overlapping images taken within seconds of each other as the satellite passes over are combined into a single time step in the data.

Note: For a more general discussion of how to load data using the datacube, refer to the Introduction to loading data notebook.

[5]:
ls8_ds = dc.load(product='ga_ls8c_ard_3',
                 measurements=[
                     'nbart_red', 'nbart_green', 'nbart_blue', 'nbart_nir',
                     'nbart_swir_1', 'nbart_swir_2', 'oa_fmask'
                 ],
                 x=(149.05, 149.17),
                 y=(-35.25, -35.32),
                 time=('2018-08-06', '2018-08-06'),
                 output_crs='EPSG:3577',
                 resolution=(-30, 30),
                 group_by='solar_day')

We can now view the data that we loaded. The satellite bands listed under Data variables should match the measurements we requested above.

[6]:
ls8_ds
[6]:
<xarray.Dataset>
Dimensions:       (time: 1, x: 395, y: 307)
Coordinates:
  * time          (time) datetime64[ns] 2018-08-06T23:49:39.234612
  * y             (y) float64 -3.953e+06 -3.953e+06 ... -3.962e+06 -3.962e+06
  * x             (x) float64 1.543e+06 1.543e+06 ... 1.555e+06 1.555e+06
    spatial_ref   int32 3577
Data variables:
    nbart_red     (time, y, x) int16 694 704 621 654 745 ... 206 201 182 168 177
    nbart_green   (time, y, x) int16 679 643 576 577 638 ... 269 264 284 288 297
    nbart_blue    (time, y, x) int16 663 639 545 541 630 ... 239 224 240 248 258
    nbart_nir     (time, y, x) int16 1490 1447 1461 1535 1628 ... 17 58 88 320
    nbart_swir_1  (time, y, x) int16 1072 1071 1055 1112 1243 ... 24 1 45 63 169
    nbart_swir_2  (time, y, x) int16 815 789 706 712 805 857 ... 60 59 31 24 62
    oa_fmask      (time, y, x) uint8 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 5 5 5 5 5 5 5
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Plotting Landsat data

We can plot the data we loaded using the rgb function. By default, the function will plot data as a true colour image using the 'nbart_red', 'nbart_green', 'nbart_blue' bands.

[7]:
rgb(ls8_ds, col='time')
../../_images/notebooks_DEA_datasets_DEA_Landsat_Surface_Reflectance_17_0.png

By plotting the ['nbart_swir_1', 'nbart_nir', 'nbart_green'] bands, we can view the imagery in false colour. This view emphasises growing vegetation in green and water in deep blue or black.

Note: For more information about plotting satellite imagery in true and false colour, refer to the Introduction to Plotting notebook.

[8]:
rgb(ls8_ds, bands=['nbart_swir_1', 'nbart_nir', 'nbart_green'])
../../_images/notebooks_DEA_datasets_DEA_Landsat_Surface_Reflectance_19_0.png

Cloud masking

Many analyses might want to exclude targets that are obscured by cloud or cloud shadow. DEA Landsat Surface Reflectance data contains information about whether each pixel is likely to be free of clouds or shadow using an OA measurement band called oa_fmask. This band is calculated using the Fmask (Function of Mask) algorithm that classifies pixels using the following mutually exclusive numerical flags:

[9]:
# Display available oa_fmask flags
masking.describe_variable_flags(ls8_ds.oa_fmask)['values'][0]
[9]:
{'0': 'nodata',
 '1': 'valid',
 '2': 'cloud',
 '3': 'shadow',
 '4': 'snow',
 '5': 'water'}
[10]:
ls8_ds.oa_fmask.plot()
[10]:
<matplotlib.collections.QuadMesh at 0x7fb5481476a0>
../../_images/notebooks_DEA_datasets_DEA_Landsat_Surface_Reflectance_22_1.png

Of the six oa_fmask flags, 1 (clear), 4 (snow) and 5 (water) represent clear pixels that are not obscured by either cloud or cloud shadow. We can use these to create a mask that can be applied to other Landsat data to remove clouds or shadows before the data is analysed.

Note: Fmask’s “water” mask should typically not be relied to accurately analyse water using satellite data. Instead, consider loading data from the DEA Water Observations product instead.

[11]:
# Identify pixels that are either "valid", "water" or "snow"
cloud_free_mask = (masking.make_mask(ls8_ds.oa_fmask, fmask="valid") |
                   masking.make_mask(ls8_ds.oa_fmask, fmask="water") |
                   masking.make_mask(ls8_ds.oa_fmask, fmask="snow"))

# Apply the mask
ls8_masked = ls8_ds.where(cloud_free_mask)

If we plot our masked data, we can see that the areas of cloud and cloud shadow above have now been masked out (i.e. area of NaN values or white pixels):

[12]:
rgb(ls8_masked)
../../_images/notebooks_DEA_datasets_DEA_Landsat_Surface_Reflectance_26_0.png

Limitations of Fmask for cloud masking Landsat data

Fmask has limitations due to the complex nature of detecting natural phenomena such as cloud. For example, bright targets such as beaches, buildings and salt lakes often get mistaken for clouds. Edges and fringes of clouds also tend to be more transparent and can be missed by the cloud detection algorithm.

Because of these limitations, it can be advisable to treat individual FMask classifications with caution, and use analysis techniques that are robust to any potential classification errors (e.g. temporal median or geomedian composites that will supress the effect of false positive or negative cloud classifications).

Note: For more information about Fmask and other Surface Reflectance Observation Attributes, refer to the Observation Attributes product description listings for Landsat 5, 7 and 8.

Advanced

Native load

DEA Landsat Surface Reflectance data is originally processed from USGS Collection 1 data which is stored on file in different Universal Transverse Mercator (UTM) CRSs depending on the longitude of the data. There are 8 UTM zones (and therefore unique Landsat CRSs) that cover Australia:

UTM zones

Reprojecting these data into a single projection system like Australian Albers (EPSG:3577) will force datacube to resample the data to fit this new spatial grid. This can lead to artefacts in the resulting imagery that can affect its usefulness for precise applications (e.g. using Landsat to accurately map the boundary between land and water using sub-pixel techniques).

Because of this, it can be useful to load Landsat data in its “native” projection and spatial resolution — that is, load the data into the same spatial grid as it is stored on file without applying any reprojection or resampling. To achieve this, we need to first determine which of the UTM zones above apply for a given location, and use this information to load our data into that CRS. We can do this using DEA’s mostcommon_crs function which takes a datacube query, then identifies the most common CRS for all data that matches the query without having to load this data into memory first.

First, we will set up a query covering the same time and location previously used to load data:

[13]:
query = {
    'x': (149.05, 149.17),
    'y': (-35.25, -35.32),
    'time': ('2018-08-06', '2018-08-06')
}

We can now pass this to mostcommon_crs. In this example, the most common CRS is 'epsg:32655' or UTM Zone 55 (this matches the map above).

[14]:
# Determine the most common CRS for the query
native_crs = mostcommon_crs(dc, product='ga_ls8c_ard_3', query=query)
native_crs
[14]:
'epsg:32655'

We can now pass this CRS to dc.load()’s output_crs parameter to load Landsat data in its native spatial grid. The native resolution of all DEA Landsat Surface Reflectance data (with the exception of Landsat 8’s panchromatic band; see the Pansharpening notebook) is 30 m, so we can supply this to resolution directly without having to obtain it from the data.

In addition to output_crs and resolution, one final parameter is required to construct the spatial grid used for natively loading Landsat data: align. By default, datacube assumes that pixel edges are aligned such that x=0 and y=0 lines fall on pixel edges. However, Landsat data is stored on file with coordinates that define the centre - not the edge - of each pixel. When natively loading data, to ensure the spatial pixel grid created by datacube is exactly the same grid as used by Landsat imagery, we need to shift this grid by 15 m in both directions by specifying align=(15, 15). Otherwise, natively loaded pixels will be offset by half a pixel from their true location.

To determine whether you need to use align when loading DEA Landsat Surface Reflectance data, all the following must apply:

  • You are loading Landsat data from the ga_ls5t_ard_3, ga_ls7e_ard_3 and ga_ls8c_ard_3 products that define pixel coordinates by their centers rather than pixel edges

  • You want to load data in its native projection without a half pixel offset

  • You are supplying a native UTM zone CRS generated by mostcommon_crs or copied from a datacube dataset

  • You are supplying a native resolution (e.g. resolution=(-30, 30))

[15]:
# Load Landsat using native CRS, resolution and appropriate alignment
ls8_native = dc.load(product='ga_ls8c_ard_3',
                     measurements=['nbart_red', 'nbart_green', 'nbart_blue'],
                     output_crs=native_crs,
                     resolution=(-30, 30),
                     align=(15, 15),
                     group_by='solar_day',
                     **query)

Note: If a study area extends across the boundary of two UTM zones, it is possible that a single datacube query will return data with multiple CRSs. In this scenario, although mostcommon_crs will return the native CRS that will involve the least overall reprojection or resampling, some data will unavoidably need to be reprojected and resampled away from its native CRS and resolution. If this is the case, mostcommon_crs will print a warning to inform you that multiple CRSs were encounted for the query. To minimise the impacts of this reprojection, you can specify a resampling method that will only be used to transform the subset of satellite observations that do not match the CRS generated by mostcommon_crs (for more information about resampling, refer to the dc.load documentation).

Filtering by metadata

In addition to generic dc.load query parameters (e.g. product, measurements, time, x, y etc), DEA Landsat Surface Reflectance data contains a set of extra metadata fields that can be queried to filter data before it is loaded. Searchable metadata fields for a product can be listed using the code below:

[16]:
dataset = dc.find_datasets(product='ga_ls8c_ard_3', limit=1)[0]
dir(dataset.metadata)
[16]:
['cloud_cover',
 'creation_dt',
 'creation_time',
 'dataset_maturity',
 'eo_gsd',
 'eo_sun_azimuth',
 'eo_sun_elevation',
 'fmask_clear',
 'fmask_cloud_shadow',
 'fmask_snow',
 'fmask_water',
 'format',
 'gqa',
 'gqa_abs_iterative_mean_x',
 'gqa_abs_iterative_mean_xy',
 'gqa_abs_iterative_mean_y',
 'gqa_abs_x',
 'gqa_abs_xy',
 'gqa_abs_y',
 'gqa_cep90',
 'gqa_iterative_mean_x',
 'gqa_iterative_mean_xy',
 'gqa_iterative_mean_y',
 'gqa_iterative_stddev_x',
 'gqa_iterative_stddev_xy',
 'gqa_iterative_stddev_y',
 'gqa_mean_x',
 'gqa_mean_xy',
 'gqa_mean_y',
 'gqa_stddev_x',
 'gqa_stddev_xy',
 'gqa_stddev_y',
 'grid_spatial',
 'id',
 'instrument',
 'label',
 'lat',
 'lon',
 'measurements',
 'platform',
 'product_family',
 'region_code',
 'sources',
 'time']

Some of the most useful metadata fields for filtering Landsat data before loading it include:

  • gqa_iterative_mean_xy: An estimate of how accurately a satellite scene is georeferenced, calculated by comparing hundreds of candidate points against a reference image, then discarding outliers to obtain a more robust estimate. Values are given in pixel units, where a value of less than 1 indicates that satellite pixels are georeferenced to within a single pixel of their true location. This parameter can be used to ensure that all loaded data is aligned spatially through time. For example, to load only imagery with a geometric accuracy of less than 0.5 of a pixel (e.g. 15 m), we can add gqa_iterative_mean_xy=(0, 0.5) to our dc.load query:

[17]:
ls8_geo = dc.load(product='ga_ls8c_ard_3',
                  measurements=['nbart_red', 'nbart_green', 'nbart_blue'],
                  x=(149.05, 149.17),
                  y=(-35.25, -35.32),
                  time=('2018-01-01', '2018-03-01'),
                  gqa_iterative_mean_xy=(0, 0.5),
                  output_crs='EPSG:3577',
                  resolution=(-30, 30),
                  group_by='solar_day')
  • cloud_cover: This gives an estimate of the percentage (i.e. from 0 to 100) of each satellite scene that contains cloud (as measured by Fmask).

For example, to filter imagery to data from scenes with between 0 and 10% cloud, we can add cloud_cover=(0, 10) to our dc.load query:

[18]:
ls8_nocloud = dc.load(product='ga_ls8c_ard_3',
                      measurements=['nbart_red', 'nbart_green', 'nbart_blue'],
                      x=(149.05, 149.17),
                      y=(-35.25, -35.32),
                      time=('2018-01-01', '2018-03-01'),
                      cloud_cover=(0, 10),
                      output_crs='EPSG:3577',
                      resolution=(-30, 30),
                      group_by='solar_day')

# Plot imagery to verify they are mostly cloud-free
rgb(ls8_nocloud, col='time')
../../_images/notebooks_DEA_datasets_DEA_Landsat_Surface_Reflectance_40_0.png

Note: Metadata provides a single value for each entire Landsat satellite scene, not the specific location specified by dc.load. For example, cloud_cover provides an estimate of cloudiness for each full Landsat scene that intersects with your query, which may not reflect the cloudiness of the smaller sub-region you have requested using x and y. For a more accurate method for filtering satellite data by the cloudiness of a specific study area, refer to the Filtering to non-cloudy observations section of the Using load_ard notebook.


Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: November 2020

Compatible datacube version:

[19]:
print(datacube.__version__)
1.8.3

Tags

Browse all available tags on the DEA User Guide’s Tags Index

Tags: NCI compatible, sandbox compatible, dea datasets, landsat 5, landsat 7, landsat 8, dea_plotting, rgb, dea_datahandling, mostcommon_crs, masking, cloud masking, native load, metadata