Introduction to plotting ec0436e757384baa81e3a91f243ab9c2

Background

Data visualisation is an important component of working with Earth Observation data. The xarray Python package provides a range of straightforward data plotting options that allow users to quickly generate simple plots from multi-dimensional datasets. To generate more complex and informative plots, the DEA Notebooks repository also provides a custom plotting module with additional easy-to-use functionality.

Description

This introductory notebook demonstrates how to visualise DEA satellite data returned from running a datacube query. The notebook demonstrates commonly used xarray plotting methods, as well as custom functions provided in the dea_tools.plotting module.

Topics covered in this notebook include:

  1. View an area of interest prior to querying the datacube

  2. Querying the datacube and loading data

  3. Plotting single band data (e.g. a single satellite band)

    • Selecting and plotting individual timesteps

    • Plotting multiple timesteps

    • Customising plot appearance

  4. Plotting three-band true or false colour imagery

    • Plotting single timesteps

    • Plotting multiple timesteps

    • Customising plot appearance


Getting started

To run this introduction to plotting data loaded from the datacube, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.

Load packages

The first step is to run %matplotlib inline, which ensures figures plot correctly in the Jupyter notebook. Next, load the datacube package to enable loading data, and a selection of custom DEA functions from the dea_tools.plotting module.

[1]:
%matplotlib inline

import datacube
import sys

sys.path.insert(1, '../Tools/')
from dea_tools.plotting import display_map, rgb

Connect to the datacube

The next step is to connect to the datacube database. The resulting dc datacube object can then be used to load data. The app parameter is a unique name used to identify the notebook that does not have any effect on the analysis.

[2]:
dc = datacube.Datacube(app="05_Plotting")

Analysis parameters

The following variables are required to establish a query for this notebook: - lat_range: The latitude range to analyse (e.g. (-27.715, -27.755)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - lon_range: The longitude range to analyse (e.g. (153.42, 153.46)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - time_range: The date range to analyse (e.g. ("2013", "2017")).

[3]:
lat_range = (-27.58, -27.666)
lon_range = (153.3, 153.4)
time_range = ("2013", "2017")

View the queried location

Before running a query and extracting and analysing data, it is useful to double-check that your location is correct. The display_map() function shows your selected area as a red rectangle on an interactive map. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.

[4]:
display_map(x=lon_range, y=lat_range)
[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query and view data

The variables specified above are used to load data from the DEA datacube through the dc.load(), which was introduced in the Loading data notebook. This introduction will continue to use the ls8_nbart_geomedian_annual product, as used in the Loading data notebook.

[5]:
ds = dc.load(product="ga_ls8c_nbart_gm_cyear_3",
             x=lon_range,
             y=lat_range,
             time=time_range)

ds
[5]:
<xarray.Dataset>
Dimensions:      (time: 5, y: 372, x: 375)
Coordinates:
  * time         (time) datetime64[ns] 2013-07-02T11:59:59.999999 ... 2017-07...
  * y            (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x            (x) float64 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref  int32 3577
Data variables:
    blue         (time, y, x) int16 236 244 236 231 224 ... 306 285 285 279 293
    green        (time, y, x) int16 353 369 359 351 343 ... 412 389 380 370 395
    red          (time, y, x) int16 284 287 279 274 256 ... 385 350 339 324 346
    nir          (time, y, x) int16 2480 2561 2579 2587 ... 1940 1956 2040 2213
    swir1        (time, y, x) int16 1071 1093 1056 1007 ... 1333 1215 1126 1179
    swir2        (time, y, x) int16 473 474 455 433 412 ... 758 648 575 520 534
    sdev         (time, y, x) float32 0.0006196 0.0001783 ... 0.001541 0.002267
    edev         (time, y, x) float32 116.4 83.31 79.72 ... 207.5 188.5 186.4
    bcdev        (time, y, x) float32 0.02475 0.01706 ... 0.04494 0.03912
    count        (time, y, x) int16 8 8 8 8 8 8 8 8 ... 14 14 14 14 14 15 14 14
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Plotting single band images

The xarray package provides built-in methods for plotting individual data variables or measurements.

To do this, identify the band to plot. In this example, the swir1 satellite band is used. To plot a single band, the data must be an xarray.DataArray (to revise the difference between xarray.Dataset and xarray.DataArray objects, refer back to the Loading data notebook):

[6]:
ds.swir1
[6]:
<xarray.DataArray 'swir1' (time: 5, y: 372, x: 375)>
array([[[1071, 1093, 1056, ...,   74,   72,   66],
        [1084, 1075, 1021, ...,   77,   73,   68],
        [1066, 1037,  971, ...,   77,   75,   68],
        ...,
        [1230, 1288,  900, ..., 1098, 1090, 1136],
        [ 822,  802,  758, ..., 1097, 1064, 1140],
        [1172,  946,  913, ..., 1075, 1014, 1079]],

       [[1130, 1138, 1112, ...,   56,   61,   97],
        [1195, 1144, 1077, ...,   57,   47,   83],
        [1223, 1146, 1055, ...,   48,   55,   55],
        ...,
        [1333, 1339,  915, ..., 1115, 1101, 1173],
        [ 901,  897,  815, ..., 1101, 1066, 1163],
        [1249, 1001,  912, ..., 1089, 1018, 1079]],

       [[1022, 1035, 1011, ...,   42,   38,   51],
        [1051, 1031,  965, ...,   41,   50,   53],
        [1063, 1005,  934, ...,   48,   50,   71],
        ...,
        [1209, 1334,  991, ..., 1080, 1044, 1098],
        [ 827,  818,  846, ..., 1073, 1044, 1110],
        [1285,  941,  895, ..., 1058,  996, 1059]],

       [[1086, 1083, 1049, ...,   38,   43,   38],
        [1126, 1088, 1027, ...,   38,   43,   37],
        [1148, 1075,  995, ...,   42,   43,   36],
        ...,
        [1267, 1344, 1005, ..., 1103, 1088, 1154],
        [ 884,  891,  966, ..., 1098, 1058, 1138],
        [1283,  994,  933, ..., 1104, 1038, 1096]],

       [[1081, 1078, 1031, ...,   56,   57,   57],
        [1076, 1064,  999, ...,   60,   61,   60],
        [1109, 1047,  972, ...,   61,   63,   62],
        ...,
        [1222, 1260,  901, ..., 1219, 1172, 1215],
        [ 886,  849,  904, ..., 1213, 1152, 1223],
        [1271,  996,  953, ..., 1215, 1126, 1179]]], dtype=int16)
Coordinates:
  * time         (time) datetime64[ns] 2013-07-02T11:59:59.999999 ... 2017-07...
  * y            (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x            (x) float64 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref  int32 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Selecting and plotting a single timestep

The returned object header above specifies that ds.swir1 is a xarray.DataArray with five timesteps (i.e. <xarray.DataArray 'swir1' (time: 5, y: 446, x: 450)>). To make a plot for a single timestep only, select the desired timestep using one of the following options:

  1. .isel(): This stands for “index selection”, which selects individual timesteps from a dataset based on the sequence of loaded timesteps. Counting in Python begins at 0, so to select the first timestep in the xarray.DataArray we can specify .isel(time=0):

[7]:
first_timestep = ds.swir1.isel(time=0)

first_timestep
[7]:
<xarray.DataArray 'swir1' (y: 372, x: 375)>
array([[1071, 1093, 1056, ...,   74,   72,   66],
       [1084, 1075, 1021, ...,   77,   73,   68],
       [1066, 1037,  971, ...,   77,   75,   68],
       ...,
       [1230, 1288,  900, ..., 1098, 1090, 1136],
       [ 822,  802,  758, ..., 1097, 1064, 1140],
       [1172,  946,  913, ..., 1075, 1014, 1079]], dtype=int16)
Coordinates:
    time         datetime64[ns] 2013-07-02T11:59:59.999999
  * y            (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x            (x) float64 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref  int32 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref
  1. .sel(): This selects data using real-world coordinate labels like time. For example, from the Coordinates section, the first timestep (i.e. the year 2013) is selected from the xarray.DataArray by specifying .sel(time='2013'):

[8]:
first_timestep = ds.swir1.sel(time='2013')

first_timestep
[8]:
<xarray.DataArray 'swir1' (time: 1, y: 372, x: 375)>
array([[[1071, 1093, 1056, ...,   74,   72,   66],
        [1084, 1075, 1021, ...,   77,   73,   68],
        [1066, 1037,  971, ...,   77,   75,   68],
        ...,
        [1230, 1288,  900, ..., 1098, 1090, 1136],
        [ 822,  802,  758, ..., 1097, 1064, 1140],
        [1172,  946,  913, ..., 1075, 1014, 1079]]], dtype=int16)
Coordinates:
  * time         (time) datetime64[ns] 2013-07-02T11:59:59.999999
  * y            (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x            (x) float64 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref  int32 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

After selecting a timestep, the .plot() method can be used to display the chosen swir1 data:

[9]:
first_timestep.plot()
[9]:
<matplotlib.collections.QuadMesh at 0x7f3541caa910>
../../../_images/notebooks_Beginners_guide_05_Plotting_22_1.png

Plotting multiple timesteps

It is often useful to produce plots for a single measurement across time, for example to compare change between satellite observations or summary datasets. To plot multiple images, skip the isel() step above and plot the entire xarray.DataArray directly.

To plot multiple timesteps in one figure, it is necessary to instruct the .plot() function to put each timestep in a different column. This is done by specifying .plot(col="time"):

[10]:
ds.swir1.plot(col="time")
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7f353ef5fe20>
../../../_images/notebooks_Beginners_guide_05_Plotting_24_1.png

This kind of plotting is called “facetted plotting”. For more information, refer to the xarray documentation

Customising plot appearance

The plots above are dark and difficult to see clearly. To improve the appearance of xarray plots, use the robust=True argument to optimise the plot colours by clipping extreme values or outliers. This will use the 2nd and 98th percentiles of the data to compute the color limits:

[11]:
ds.swir1.plot(col="time", robust=True)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x7f353ef73be0>
../../../_images/notebooks_Beginners_guide_05_Plotting_27_1.png

Plots can be further customised by adding custom colour maps/styles using the cmap parameter.

When choosing a colour map for a plot, it is important to choose a set of colours that are perceived logically by the human eye. The best colour maps are “perceptually uniform”: these colour maps increase logically from dark to light colours, where equal increases in lightness/darkness correspond to equal changes in data values. Some best-practice perceptually uniform colour maps include:

"viridis", "plasma", "inferno", "magma", "cividis"

For further reading about perceptually uniform colour maps in data visualisation, refer to the matplotlib documentation

It is also important to consider colour blindness when selecting a colour map. xarray supports many colour maps from the “colorbrewer” family of colour maps which are optimised for colour blindness. You can use the interactive online tool to browse all available colour maps, or choose from one of the following commonly used options:

"Greys", "Purples", "Blues", "Greens", "Oranges", "Reds",
"YlOrBr", "YlOrRd", "OrRd", "PuRd", "RdPu", "BuPu",
"GnBu", "PuBu", "YlGnBu", "PuBuGn", "BuGn", "YlGn"

For a full list of available colour maps you can refer to this list.

The example cell below plots the data with the perceptually uniform magma colour map:

[12]:
ds.swir1.plot(col="time", robust=True, cmap="magma")
[12]:
<xarray.plot.facetgrid.FacetGrid at 0x7f3541ed6730>
../../../_images/notebooks_Beginners_guide_05_Plotting_29_1.png

Plotting true or false colour RGB images

Although xarray makes it easy to plot single band images, plotting a three band colour photo-like image is less straightforward.

To make this easier, the dea-notebooks repository provides a custom rgb() function that is designed for plotting three band images. The rgb() function maps three data variables/measurements from the loaded dataset to the red, green and blue channels that are used to make a three-colour image.

Providing the red, green and blue measurements from a dataset will produce a true colour image (akin to how humans view the landscape). Providing nir, red and green measurements or any other set of three satellite bands from a dataset will produce a false colour image.

Hence, the rgb() function can be used to visualise the data returned by a query. It requires the minimum input of:

  • ds: The xarray.Dataset object

  • bands: Three bands for display (these must be measurements found in the dataset)

  • index: The timestep to view, default is 0

Plotting a single timestep

The time dimension of an xarray.Dataset describes how many timesteps exist for the loaded data. In the rgb() function, the index variable is asking for which timesteps to view (similar to the isel() example above). Remember: counting in Python begins at 0 so to view the earliest timesteps set index=0:

[13]:
# View a red, green, blue (true colour) image of the first timestep
rgb(ds, bands=["red", "green", "blue"], index=0)
../../../_images/notebooks_Beginners_guide_05_Plotting_32_0.png

It is possible to change the input bands to plot a false colour image, which can provide different insights in a landscape. The false colour band combination (swir1, nir, green) emphasises growing vegetation in green, and water in deep blue:

[14]:
# View a swir1, nir, green (false colour) image of the first timestep
rgb(ds, bands=['swir1', 'nir', 'green'], index=0)
../../../_images/notebooks_Beginners_guide_05_Plotting_34_0.png

Plotting multiple timesteps

As discussed in the single band example above, it can be useful to visualise multiple timesteps in a single plot (e.g. to compare change over time).

The rgb() function can also do this, as long as a list of timesteps to view is provided to the index argument, e.g. index=[X1, X2, ...]. The example cell below plots the first and fifth image in the dataset using index=[0, 4] (remembering that counting in Python starts at 0):

[15]:
# View a true colour image for the first and fifth timesteps
rgb(ds, bands=['red', 'green', 'blue'], index=[0, 4])
../../../_images/notebooks_Beginners_guide_05_Plotting_36_0.png

It is also possible to use rgb() to plot all timesteps in a dataset using the col="time" syntax demonstrated in the single band example above:

[16]:
# Plot all timesteps in the dataset
rgb(ds, bands=['red', 'green', 'blue'], col="time")
../../../_images/notebooks_Beginners_guide_05_Plotting_38_0.png

Customising plot appearance

By default, rgb() generates plots with robust=True to improve the appearance of the images by clipping out the darkest and brightest 2% of pixels, using the 2nd and 98th percentiles of the data to compute the colour limits

If this default provides poor results, the plot’s colour stretch can be customised using the percentile_stretch parameter. This clips the most extreme minimum and maximum values in the dataset, improving the contrast and appearance of the plot.

For example, specifying percentile_stretch=[0.05, 0.95] will clip out the darkest and brightest 5% of pixels, focusing the colour stretch on the remaining 90% of less extreme values:

[17]:
rgb(ds,
    bands=['red', 'green', 'blue'],
    index=0,
    percentile_stretch=[0.05, 0.95])

../../../_images/notebooks_Beginners_guide_05_Plotting_40_0.png

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on GitHub.

Last modified: December 2023

Compatible datacube version:

[18]:
print(datacube.__version__)
1.8.6

Tags

Tags: sandbox compatible, NCI compatible, landsat 8, annual geomedian, plotting, dea_plotting, rgb, beginner, data visualisation, colour maps, isel, sel