# Introduction to Digital Earth Australia <img align="right" src="../Supplementary_data/dea_logo.jpg">

* **[Sign up to the DEA Sandbox](https://app.sandbox.dea.ga.gov.au/)** to run this notebook interactively from a browser
* **Compatibility**: Notebook currently compatible with both the `NCI` and `DEA Sandbox` environments
* **Prerequisites**:  Users of this notebook should have a basic understanding of:
    * How to run a [Jupyter notebook](01_Jupyter_notebooks.ipynb)

## Background
[Digital Earth Australia](https://www.ga.gov.au/dea) (DEA) is a digital platform that catalogues large amounts of Earth observation data covering continental Australia.
It is underpinned by the [Open Data Cube](https://www.opendatacube.org/) (ODC), an open source software package that has an ever growing number of users, contributors and implementations.

The ODC and DEA platforms are designed to:

* Catalogue large amounts of Earth observation data
* Provide a Python based API for high performance querying and data access
* Give users easy ability to perform exploratory data analysis
* Allow scalable continent-scale processing of the stored data
* Track the provenance of data to allow for quality control and updates

The DEA program catalogues data from a range of satellite sensors and has adopted processes and terminology that users should be aware of to enable efficient querying and use of the datasets stored within.
This notebook introduces these important concepts and forms the basis of understanding for the remainder of the notebooks in this beginner's guide.
Resources to further explore these concepts are recommended at the end of the notebook.

## Description
This introduction to DEA will briefly introduce the ODC and review the types of data catalogued in the DEA platform.
It will also cover commonly-used terminology for measurements within product datasets.
Topics covered include:

* A brief introduction to the ODC
* A review of the satellite sensors that provide data to DEA
* An introduction to analysis ready data and the processes to make it 
* DEA's data naming conventions
* Coordinate reference scheme
* Derived products
    
***

## Open Data Cube

![Open Data Cube logo](../Supplementary_data/02_DEA/odc.png)

The [Open Data Cube](https://www.opendatacube.org/) (ODC) is an open-source software package for organising and analysing large quantities of Earth observation data.
At its core, the Open Data Cube consists of a database where data is stored, along with commands to load, view and analyse that data.
This functionality is delivered by the [datacube-core](https://github.com/opendatacube/datacube-core) open-source Python library.
The library is designed to enable and support:

* Large-scale workflows on high performance computing infrastructures
* Exploratory data analysis
* Cloud-based services
* Standalone applications

There are a number of existing implementations of the ODC, including DEA and [Digital Earth Africa](https://www.digitalearthafrica.org/).
More information can be found in the [Open Data Cube Manual](https://datacube-core.readthedocs.io/en/latest/index.html).


## Satellite datasets in DEA
Digital Earth Australia catalogues data from a range of satellite sensors. 
The earliest datasets of optical satellite imagery in DEA date from 1986.
DEA includes data from:

* [Landsat 5 TM](https://www.usgs.gov/landsat-missions/landsat-5) (LS5 TM), operational between March 1984 and January 2013
* [Landsat 7 ETM+](https://www.usgs.gov/landsat-missions/landsat-7) (LS7 ETM+), operational between April 1999 and April 2022
* [Landsat 8 OLI](https://www.usgs.gov/landsat-missions/landsat-8) (LS8 OLI), operational since February 2013
* [Landsat 9 OLI](https://www.usgs.gov/landsat-missions/landsat-9) (LS9 OLI), operational since September 2021
* [Sentinel 2A MSI](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) (S2A MSI), operational since June 2015
* [Sentinel 2B MSI](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) (S2B MSI), operational since March 2017
* [Sentinel 2C MSI](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) (S2C MSI), operational since January 2025

Landsat missions are jointly operated by the United States Geological Survey (USGS) and National Aeronautics and Space Administration (NASA).
Sentinel missions are operated by the European Space Agency (ESA).
One major difference between the two programs is the spatial resolution: each Landsat pixel represents 30 x 30 m on the ground while each Sentinel-2 pixel represents 10 x 10 m to 60 x 60 m depending on the spectral band.

### Spectral bands
All of the datasets listed above are captured by multispectral satellites.
This means that the satellites measure primarily light that is reflected from the Earth's surface in discrete sections of the electromagnetic spectrum, known as *spectral bands*. 
Figure 1 shows the spectral bands for recent Landsat and Sentinel-2 sensors, allowing a direct comparison of how each sensor samples the overall electromagnetic spectrum.
Landsat 5 TM is not displayed in this image; for reference, it measured light in seven bands that covered the same regions as bands 1 to 7 on Landsat 7 ETM+.

![Image](../Supplementary_data/02_DEA/Landsat9_Auto2E.jpeg)

> **Figure 1:** The bands that are detected by each of the satellites are shown in the numbered boxes and the width of each box represents the spectral range that band detects.
The bands are overlaid on the percentage transmission of each wavelength returned to the atmosphere from the Earth relative to the amount of incoming solar radiation. 
The y-axis has no bearing on the comparison of the satellite sensors [[source]](https://directory.eoportal.org/web/eoportal/satellite-missions/l/landsat-9).

Figure 1 highlights that the numbering of the bands relative to the detected wavelengths is inconsistent between sensors.
As an example, in the green region of the electromagnetic spectrum (around 560 nm), Landsat 5 TM and Landsat 7 ETM+ detect a wide green region called band 2, where as Landsat 8 OLI detects a slightly narrower region and calls it band 3.
Finally, Sentinel-2 MSI detects a narrow green region but also calls it band 3.
Consequently, when working with different sensors, it is important to understand the differences in their bands, and any impact this could have on an analysis.
To promote awareness of these differences, DEA band naming is based on both the spectral band name and sample region.
The naming convention will be covered in more detail in the [DEA band naming conventions section](#DEA-band-naming-conventions).

## Analysis Ready Data

Digital Earth Australia produces Analysis Ready Data (ARD) for each of the sensors listed above.
The [ARD standard](http://ceos.org/ard/) for satellite data requires that data have undergone a number of processing steps, along with the creation of additional attributes for the data.
DEA's ARD datasets include the following characteristics:

* **Geometric correction:** This includes establishing ground position, accounting for terrain (orthorectification) and ground control points, and assessing absolute position accuracy. 
Geometric calibration means that imagery is positioned accurately on the Earth's surface and stacked consistently so that sequential observations can be used to track meaningful change over time.
Adjustments for ground variability typically use a Digital Elevation Model (DEM).
* **Surface reflectance correction:** This includes adjustments for sensor/instrument gains, biases and offsets, include adjustments for terrain illumination and sensor viewing angle with respect to the pixel position on the surface.
Once satellite data is processed to surface reflectance, pixel values from the same sensor can be compared consistently both spatially and over time.
* **Observation attributes:** Per-pixel metadata such as quality flags and content attribution that enable users to make informed decisions about the suitability of the products for their use. For example, clouds, cloud shadows, missing data, saturation and water are common pixel level attributes.
* **Metadata:** Dataset metadata including the satellite, instrument, acquisition date and time, spatial boundaries, pixel locations, mode, processing details, spectral or frequency response and grid projection.

### Surface reflectance

Optical sensors, such as those on the Landsat and Sentinel-2 satellites, measure light that has come from the sun and been reflected by the Earth's surface.
The sensor measures the intensity of light in each of its spectral bands (known as "radiance").
The intensity of this light is affected by many factors including the angle of the sun relative to the ground, the angle of the sensor relative to the ground, and how the light interacts with the Earth's atmosphere on its way to the sensor. 
Because radiance can be affected by so many factors, it is typically more valuable to determine how much light was originally reflected at the ground level.
This is known as bottom-of-atmosphere **surface reflectance**.
Surface reflectance can be calculated by using robust physical models to correct the observed radiance values based on atmospheric conditions, the angle of the sun, sensor geometry and local topography or terrain.

There are many approaches to satellite surface reflectance correction and DEA opts to use two: NBAR and NBART.
**Users will choose which of these measurements to load when querying the DEA datacube and so it is important to understand their major similarities and differences.**

#### NBAR
NBAR stands for *Nadir-corrected BRDF Adjusted Reflectance*, where BRDF stands for *Bidirectional reflectance distribution function*.
The approach involves atmospheric correction to compute bottom-of-atmosphere radiance, and bi-directional reflectance modelling to remove the effects of topography and angular variation in reflectance.
NBAR can be useful for analyses in extremely flat areas not affected by terrain shadow, and for producing attractive data visualisations that are not affected by NBART's nodata gaps (see below).

#### NBART
NBART has the same features of NBAR but includes an additional *terrain illumination* reflectance correction and as such considered to be actual surface reflectance as it takes into account the surface topography.
Terrain affects optical satellite images in a number of ways; for example, slopes facing the sun receive more sunlight and appear brighter compared to those facing away from the sun.
To obtain comparable surface reflectance from satellite images covering hilly areas, it is therefore necessary to process the images to reduce or remove the topographic effect.
This correction is performed with a Digital Surface Model (DSM) that has been resampled to the same resolution as the satellite data being corrected.
NBART is typically the default choice for most analyses as removing terrain illumination and shadows allows changes in the landscape to be compared more consistently across time. 
However, it can be prone to distortions in extremely flat areas if noisy elevation values exist in the DSM.

![Comparison between NBAR and NBART](../Supplementary_data/02_DEA/nbar_nbart_animation.gif)


**Figure 2:** The animation above demonstrates how the NBART correction results in a significantly more two-dimensional looking image that is less affected by terrain illumination and shadow.
Black pixels in the NBART image represent areas of deep terrain shadow that can't be corrected as they're determined not to be viewable by either the sun or the satellite. 
These are represented by -999 `nodata` values in the data.


### Observation Attributes

The *Observation Attributes (OA)* are a suite of measurements included in DEA's analysis ready datasets.
They are an assessment of each image pixel to determine if it is an unobscured, unsaturated observation of the Earth's surface, along with whether the pixel is represented in each spectral band. 
The OA product allows users to exclude pixels that do not meet the quality criteria for their analysis.
The capacity to automatically exclude such pixels is essential for analysing any change over time, since poor-quality pixels can significantly alter the percieved change over time.
The most common use of OA is for cloud masking, where users can choose to remove images that have too much cloud, or ignore the clouds within each satellite image.
A demonstration of how to use cloud masking can be found in the [masking data](../How_to_guides/Masking_data.ipynb) notebook.

The OA suite of measurements include the following observation pixel-based attributes:

* Null pixels
* Clear pixels
* Cloud pixels
* Cloud shadow pixels
* Snow pixels
* Water pixels
* Terrain shaded pixels
* Spectrally contiguous pixels (i.e. whether a pixel contains data in every spectral band)

Also included is a range of pixel-based attributes related to the satellite, solar and sensing geometries:

* Solar zenith
* Solar azimuth
* Satellite view
* Incident angle
* Exiting angle
* Azimuthal incident
* Azimuthal exiting
* Relative azimuth
* Timedelta


## Data format

### DEA band naming conventions

To account for the various available satellite datasets, DEA uses a band naming convention to help distinguish datasets that come from the different sensors. 
The band names are comprised of the applied surface reflectance correction (NBAR or NBART) and the spectral region detected by the satellite. 
This removes all reference to the sensor band numbering scheme (e.g. [Figure 1](#Spectral-Bands)) and assumes that users understand that the spectral region described by the DEA band name is only approximately the same between sensors, not identical.

**Table 1** summarises the DEA band naming terminology for the spectral regions common to both Landsat and Sentinel, coupled with the corresponding NBAR and NBAR band names for the available sensors:

|Spectral region|DEA measurement name (NBAR)|DEA measurement name (NBAR)|Landsat 5<br>TM|Landsat 7<br>ETM+|Landsat 8<br>OLI|Sentinel-2<br>MSI|
|----|----|----|----|----|----|----|
|Coastal aerosol|nbar_coastal_aerosol|nbart_coastal_aerosol|||1|1|
|Blue|nbar_blue|nbart_blue|1|1|2|2|
|Green|nbar_green|nbart_green|2|2|3|3|
|Red|nbar_red|nbart_red|3|3|4|4|
|NIR (Near infra-red)|nbar_nir (Landsat)<br>nbar_nir_1 (Sentinel-2)|nbart_nir (Landsat) <br>nbart_nir_1 (Sentinel-2)|4|4|5|8|
|SWIR 1 (Short wave infra-red 1)|nbar_swir_1 (Landsat) <br>nbar_swir_2 (Sentinel-2) |nbart_swir_1 (Landsat) <br>nbart_swir_2 (Sentinel-2)|5|5|6|11|
|SWIR 2 (Short wave infra-red 2)|nbar_swir_2 (Landsat) <br>nbar_swir_3 (Sentinel-2) |nbart_swir_2 (Landsat) <br>nbart_swir_3 (Sentinel-2)|7|7|7|12|

> **Note:** Be aware that NIR and SWIR band names differ between Landsat and Sentinel-2 due to the different number of these bands available in Sentinel-2. The `nbar_nir` Landsat band corresponds to the spectral region covered by Sentinel-2's `nbar_nir_1` band, the `nbar_swir_1` Landsat band corresponds to Sentinel-2's `nbar_swir_2` band, and the `nbar_swir_2` Landsat band corresponds to Sentinel-2's `nbar_swir_3` band.


### DEA satellite data projection and holdings
Keeping with the practices of the Landsat and Sentinel satellite programs, all DEA satellite datasets are projected using the **Universal Transverse Mercator (UTM)** coordinate reference system.
The World Geodectic System 84 (WGS84) ellipsoid is used to model the UTM projection. All data queries default to the WGS84 datum's coordinate reference system unless specified otherwise.

By default, the spatial extent of the DEA data holdings is approximately the Australian coastal shelf. 
The actual extent varies based on the sensor and product. 
The current extents of each DEA product can be viewed using the interactive [DEA Explorer](https://explorer.dea.ga.gov.au).

## Derived products

![](../Supplementary_data/02_DEA/dea_products.jpg)

In addition to ARD satellite data, DEA generates a range of products that are derived from Landsat or Sentinel-2 surface reflectance data.
These products have been developed to characterise and monitor different aspects of Australia's natural and built environment, such as mapping the distribution of water and vegetation across the landscape through time.

For more information about DEA's derived products, refer to the [DEA website](http://www.ga.gov.au/dea/products), the [Data Products page of the DEA Knowledge Hub](https://knowledge.dea.ga.gov.au/), or the "DEA_products" notebooks on the DEA Sandbox (e.g. [Introduction to DEA Surface Reflectance (Landsat, Collection 3)](../DEA_products/DEA_Landsat_Surface_Reflectance.ipynb)).

## Recommended next steps
For more detailed information on the concepts introduced in this notebook, please see the [DEA Knowledge Hub](https://knowledge.dea.ga.gov.au/) and [Open Data Cube Manual](https://datacube-core.readthedocs.io/en/latest/).
For more information on the development of the DEA platform, please see [Dhu et al. 2017](https://doi.org/10.1080/20964471.2017.1402490).

To continue with the beginner's guide, the following notebooks are designed to be worked through in the following order:

1. [Jupyter Notebooks](01_Jupyter_notebooks.ipynb)
2. **Digital Earth Australia (this notebook)**
3. [Products and Measurements](03_Products_and_measurements.ipynb)
4. [Loading data](04_Loading_data.ipynb)
5. [Plotting](05_Plotting.ipynb)
6. [Performing a basic analysis](06_Basic_analysis.ipynb)
7. [Introduction to Numpy](07_Intro_to_numpy.ipynb)
8. [Introduction to Xarray](08_Intro_to_xarray.ipynb)
9. [Parallel processing with Dask](09_Parallel_processing_with_Dask.ipynb)

Once you have worked through the beginner's guide, you can join advanced users by exploring:

* The demonstration of how to use cloud masking in the [masking data](../How_to_guides/Masking_data.ipynb) notebook.
* The "DEA_products" directory in the repository, where you can explore DEA products in depth.
* The "How_to_guides" directory, which contains a recipe book of common techniques and methods for analysing DEA data.
* The "Real_world_examples" directory, which provides more complex workflows and analysis case studies.

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Discord chat](https://discord.com/invite/4hhBQVas5U) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [GitHub](https://github.com/GeoscienceAustralia/dea-notebooks).

**Last modified:** February 2025

## Tags
<!-- Browse all available tags on the DEA User Guide's [Tags Index](https://knowledge.dea.ga.gov.au/genindex/) -->