Introduction to plotting

Background

Data visualisation is an important component of working with Earth Observation data. The xarray Python package provides a range of straightforward data plotting options that allow users to quickly generate simple plots from multi-dimensional datasets. To generate more complex and informative plots, the DEA Notebooks repository also provides a custom plotting module with additional easy-to-use functionality.

Description

This introductory notebook demonstrates how to visualise DEA satellite data returned from running a datacube query. The notebook demonstrates commonly used xarray plotting methods, as well as custom functions provided in the dea_plotting script.

Topics covered in this notebook include:

  1. View an area of interest prior to querying the datacube

  2. Querying the datacube and loading data

  3. Plotting single band data (e.g. a single satellite band)

    • Selecting and plotting individual timesteps

    • Plotting multiple timesteps

    • Customising plot appearance

  4. Plotting three-band true or false colour imagery

    • Plotting single timesteps

    • Plotting multiple timesteps

    • Customising plot appearance


Getting started

To run this introduction to plotting data loaded from the datacube, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.

Load packages

The first step is to run %matplotlib inline, which ensures figures plot correctly in the Jupyter notebook. Next, load the datacube package, which enables loading data and a selection of custom DEA functions from the dea_plotting script (located inside the dea-notebooks repository).

[1]:
%matplotlib inline

import datacube
import sys

sys.path.append("../Scripts")
from dea_plotting import display_map
from dea_plotting import rgb

Connect to the datacube

The next step is to connect to the datacube database. The resulting dc datacube object can then be used to load data. The app parameter is a unique name used to identify the notebook that does not have any effect on the analysis.

[2]:
dc = datacube.Datacube(app="05_Plotting")

Analysis parameters

The following variables are required to establish a query for this notebook: - lat_range: The latitude range to analyse (e.g. (-27.715, -27.755)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - lon_range: The longitude range to analyse (e.g. (153.42, 153.46)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - time_range: The date range to analyse (e.g. ("2013", "2017")).

[3]:
lat_range = (-27.58, -27.666)
lon_range = (153.3, 153.4)
time_range = ("2013", "2017")

View the queried location

Before running a query and extracting and analysing data, it is useful to double-check that your location is correct. The display_map() function shows your selected area as a red rectangle on an interactive map. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.

[4]:
display_map(x=lon_range, y=lat_range)
[4]:

Query and view data

The variables specified above are used to load data from the DEA datacube through the dc.load(), which was introduced in the Loading data notebook. This introduction will continue to use the ls8_nbart_geomedian_annual product, as used in the Loading data notebook.

[5]:
ds = dc.load(product="ls8_nbart_geomedian_annual",
             x=lon_range,
             y=lat_range,
             time=time_range)

print(ds)
<xarray.Dataset>
Dimensions:  (time: 5, x: 450, y: 446)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 2014-01-01 ... 2017-01-01
  * y        (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x        (x) float64 2.066e+06 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
Data variables:
    blue     (time, y, x) int16 247 258 244 246 231 215 ... 291 287 277 296 324
    green    (time, y, x) int16 365 384 368 363 354 320 ... 394 381 371 397 431
    red      (time, y, x) int16 292 303 288 289 280 237 ... 343 344 315 340 385
    nir      (time, y, x) int16 2492 2614 2572 2559 2479 ... 2049 2140 2226 2351
    swir1    (time, y, x) int16 1064 1085 1040 971 947 ... 1181 1069 1107 1276
    swir2    (time, y, x) int16 468 471 450 419 413 360 ... 587 548 485 494 584
Attributes:
    crs:      EPSG:3577

Plotting single band images

The xarray package provides built-in methods for plotting individual data variables or measurements.

To do this, identify the band to plot. In this example, the swir1 satellite band is used. To plot a single band, the data must be an xarray.DataArray (to revise the difference between xarray.Dataset and xarray.DataArray objects, refer back to the Loading data notebook):

[6]:
print(ds.swir1)
<xarray.DataArray 'swir1' (time: 5, y: 446, x: 450)>
array([[[1064, 1085, 1040, ...,   74,   71,   68],
        [1097, 1071,  998, ...,   77,   73,   69],
        [1110, 1020,  945, ...,   78,   74,   66],
        ...,
        [1122, 1023,  810, ..., 1077, 1038, 1140],
        [ 622,  549,  697, ..., 1039, 1022, 1197],
        [1173, 1071, 1172, ...,  968, 1042, 1153]],

       [[1113, 1139, 1088, ...,   58,   65,   95],
        [1167, 1122, 1050, ...,   58,   59,   59],
        [1241, 1133, 1048, ...,   58,   58,   59],
        ...,
        [1143, 1056,  877, ..., 1057, 1061, 1154],
        [ 673,  618,  808, ..., 1053, 1055, 1195],
        [1226, 1079, 1082, ...,  993, 1021, 1173]],

       [[1004, 1033,  989, ...,   51,   54,   56],
        [1054, 1021,  932, ...,   54,   57,   57],
        [1089, 1018,  908, ...,   50,   55,   72],
        ...,
        [1013,  990,  915, ..., 1057, 1034, 1124],
        [ 677,  629,  808, ..., 1035, 1041, 1160],
        [1316, 1031, 1000, ...,  974, 1010, 1168]],

       [[1063, 1078, 1029, ...,   54,   56,   57],
        [1135, 1082,  985, ...,   54,   55,   60],
        [1149, 1081,  977, ...,   54,   58,   58],
        ...,
        [1092, 1040, 1032, ..., 1084, 1075, 1182],
        [ 676,  625,  873, ..., 1085, 1082, 1199],
        [1378, 1174, 1266, ..., 1042, 1068, 1220]],

       [[1058, 1062,  985, ...,   53,   54,   54],
        [1058, 1053,  965, ...,   57,   56,   55],
        [1085, 1036,  947, ...,   56,   58,   58],
        ...,
        [1091,  998,  945, ..., 1145, 1155, 1177],
        [ 725,  694,  872, ..., 1140, 1118, 1231],
        [1345, 1192, 1196, ..., 1069, 1107, 1276]]], dtype=int16)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 2014-01-01 ... 2017-01-01
  * y        (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x        (x) float64 2.066e+06 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
Attributes:
    units:    1
    nodata:   -999
    crs:      EPSG:3577

Selecting and plotting a single timestep

The returned object header above specifies that ds.swir1 is a xarray.DataArray with five timesteps (i.e. <xarray.DataArray 'swir1' (time: 5, y: 446, x: 450)>). To make a plot for a single timestep only, select the desired timestep using one of the following options:

  1. .isel(): This stands for “index selection”, which selects individual timesteps from a dataset based on the sequence of loaded timesteps. Counting in Python begins at 0, so to select the first timestep in the xarray.DataArray we can specify .isel(time=0):

[7]:
first_timestep = ds.swir1.isel(time=0)

print(first_timestep)
<xarray.DataArray 'swir1' (y: 446, x: 450)>
array([[1064, 1085, 1040, ...,   74,   71,   68],
       [1097, 1071,  998, ...,   77,   73,   69],
       [1110, 1020,  945, ...,   78,   74,   66],
       ...,
       [1122, 1023,  810, ..., 1077, 1038, 1140],
       [ 622,  549,  697, ..., 1039, 1022, 1197],
       [1173, 1071, 1172, ...,  968, 1042, 1153]], dtype=int16)
Coordinates:
    time     datetime64[ns] 2013-01-01
  * y        (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x        (x) float64 2.066e+06 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
Attributes:
    units:    1
    nodata:   -999
    crs:      EPSG:3577
  1. .sel(): This selects data using real-world coordinate labels like time. For example, from the Coordinates section, the first timestep (i.e. the year 2013) is selected from the xarray.DataArray by specifying .sel(time='2013'):

[8]:
first_timestep = ds.swir1.sel(time='2013')

print(first_timestep)
<xarray.DataArray 'swir1' (time: 1, y: 446, x: 450)>
array([[[1064, 1085, 1040, ...,   74,   71,   68],
        [1097, 1071,  998, ...,   77,   73,   69],
        [1110, 1020,  945, ...,   78,   74,   66],
        ...,
        [1122, 1023,  810, ..., 1077, 1038, 1140],
        [ 622,  549,  697, ..., 1039, 1022, 1197],
        [1173, 1071, 1172, ...,  968, 1042, 1153]]], dtype=int16)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01
  * y        (y) float64 -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x        (x) float64 2.066e+06 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
Attributes:
    units:    1
    nodata:   -999
    crs:      EPSG:3577

After selecting a timestep, the .plot() method can be used to display the chosen swir1 data:

[9]:
first_timestep.plot()
[9]:
<matplotlib.collections.QuadMesh at 0x7fd8df844940>
../../_images/notebooks_Beginners_guide_05_Plotting_22_1.png

Plotting multiple timesteps

It is often useful to produce plots for a single measurement across time, for example to compare change between satellite observations or summary datasets. To plot multiple images, skip the isel() step above and plot the entire xarray.DataArray directly.

To plot multiple timesteps in one figure, it is necessary to instruct the .plot() function to put each timestep in a different column. This is done by specifying .plot(col="time"):

[10]:
ds.swir1.plot(col="time")
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7fd8df9266d8>
../../_images/notebooks_Beginners_guide_05_Plotting_24_1.png

This kind of plotting is called “facetted plotting”. For more information, refer to the xarray documentation

Customising plot appearance

The plots above are dark and difficult to see clearly. To improve the appearance of xarray plots, use the robust=True argument to optimise the plot colours by clipping extreme values or outliers. This will use the 2nd and 98th percentiles of the data to compute the color limits:

[11]:
ds.swir1.plot(col="time", robust=True)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x7fd8df873b38>
../../_images/notebooks_Beginners_guide_05_Plotting_27_1.png

Plots can be further customised by adding custom colour maps/styles using the cmap parameter.

When choosing a colour map for a plot, it is important to choose a set of colours that are perceived logically by the human eye. The best colour maps are “perceptually uniform”: these colour maps increase logically from dark to light colours, where equal increases in lightness/darkness correspond to equal changes in data values. Some best-practice perceptually uniform colour maps include:

"viridis", "plasma", "inferno", "magma", "cividis"

For further reading about perceptually uniform colour maps in data visualisation, refer to the `matplotlib documentation <https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html>`__

It is also important to consider colour blindness when selecting a colour map. xarray supports many colour maps from the “colorbrewer” family of colour maps which are optimised for colour blindness. You can use the interactive online tool to browse all available colour maps, or choose from one of the following commonly used options:

"Greys", "Purples", "Blues", "Greens", "Oranges", "Reds",
"YlOrBr", "YlOrRd", "OrRd", "PuRd", "RdPu", "BuPu",
"GnBu", "PuBu", "YlGnBu", "PuBuGn", "BuGn", "YlGn"

For a full list of available colour maps you can refer to this list.

The example cell below plots the data with the perceptually uniform magma colour map:

[12]:
ds.swir1.plot(col="time", robust=True, cmap="magma")
[12]:
<xarray.plot.facetgrid.FacetGrid at 0x7fd8c682ee48>
../../_images/notebooks_Beginners_guide_05_Plotting_29_1.png

Plotting true or false colour RGB images

Although xarray makes it easy to plot single band images, plotting a three band colour photo-like image is less straightforward.

To make this easier, the dea-notebooks repository provides a custom rgb() function that is designed for plotting three band images. The rgb() function maps three data variables/measurements from the loaded dataset to the red, green and blue channels that are used to make a three-colour image.

Providing the red, green and blue measurements from a dataset will produce a true colour image (akin to how humans view the landscape). Providing nir, red and green measurements or any other set of three satellite bands from a dataset will produce a false colour image.

Hence, the rgb() function can be used to visualise the data returned by a query. It requires the minimum input of:

  • ds: The xarray.Dataset object

  • bands: Three bands for display (these must be measurements found in the dataset)

  • index: The timestep to view, default is 0

Plotting a single timestep

The time dimension of an xarray.Dataset describes how many timesteps exist for the loaded data. In the rgb() function, the index variable is asking for which timesteps to view (similar to the isel() example above). Remember: counting in Python begins at 0 so to view the earliest timesteps set index=0:

[13]:
# View a red, green, blue (true colour) image of the first timestep
rgb(ds, bands=["red", "green", "blue"], index=0)
../../_images/notebooks_Beginners_guide_05_Plotting_32_0.png

It is possible to change the input bands to plot a false colour image, which can provide different insights in a landscape. The false colour band combination (swir1, nir, green) emphasises growing vegetation in green, and water in deep blue:

[14]:
# View a swir1, nir, green (false colour) image of the first timestep
rgb(ds, bands=['swir1', 'nir', 'green'], index=0)
../../_images/notebooks_Beginners_guide_05_Plotting_34_0.png

Plotting multiple timesteps

As discussed in the single band example above, it can be useful to visualise multiple timesteps in a single plot (e.g. to compare change over time).

The rgb() function can also do this, as long as a list of timesteps to view is provided to the index argument, e.g. index=[X1, X2, ...]. The example cell below plots the first and fifth image in the dataset using index=[0, 4] (remembering that counting in Python starts at 0):

[15]:
# View a true colour image for the first and fifth timesteps
rgb(ds, bands=['red', 'green', 'blue'], index=[0, 4])
../../_images/notebooks_Beginners_guide_05_Plotting_36_0.png

It is also possible to use rgb() to plot all timesteps in a dataset using the col="time" syntax demonstrated in the single band example above:

[16]:
# Plot all timesteps in the dataset
rgb(ds, bands=['red', 'green', 'blue'], col="time")
../../_images/notebooks_Beginners_guide_05_Plotting_38_0.png

Customising plot appearance

By default, rgb() generates plots with robust=True to improve the appearance of the images by clipping out the darkest and brightest 2% of pixels, using the 2nd and 98th percentiles of the data to compute the colour limits

If this default provides poor results, the plot’s colour stretch can be customised using the percentile_stretch parameter. This clips the most extreme minimum and maximum values in the dataset, improving the contrast and appearance of the plot.

For example, specifying percentile_stretch=[0.05, 0.95] will clip out the darkest and brightest 5% of pixels, focusing the colour stretch on the remaining 90% of less extreme values:

[17]:
rgb(ds,
    bands=['red', 'green', 'blue'],
    index=0,
    percentile_stretch=[0.05, 0.95])

../../_images/notebooks_Beginners_guide_05_Plotting_40_0.png

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: December 2019

Compatible datacube version:

[18]:
print(datacube.__version__)
1.7+142.g7f8581cf

Tags

Browse all available tags on the DEA User Guide’s Tags Index

Tags: sandbox compatible, NCI compatible, landsat 8, annual geomedian, plotting, dea_plotting, rgb, beginner, data visualisation, colour maps, isel, sel