Exporting cloud-optimised GeoTIFF files cb16567941a74be5beeb1ac2f0b2366a

Background

At the end of an analysis it can be useful to export data to a GeoTIFF file (e.g. outputname.tif), either to save results or to allow for exploring results in a GIS software platform (e.g. ArcGIS or QGIS).

A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file (i.e. that can be opened by GIS software like QGIS or ArcMap) aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud.

Description

This notebook shows a number of ways to export a GeoTIFF file using the datacube.utils.cog function write_cog:

  1. Exporting a single-band, single time-slice GeoTIFF from an xarray object loaded through a dc.load query

  2. Exporting a multi-band, single time-slice GeoTIFF from an xarray object loaded through a dc.load query

  3. Exporting multiple GeoTIFFs, one for each time-slice of an xarray object loaded through a dc.load query

In addition, the notebook demonstrates several more advanced applications of write_cog:

  1. Passing in custom rasterio parameters to override the function’s defaults

  2. Exporting data from lazily loaded dask arrays


Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

[1]:
import sys
import rasterio
import datacube
from datacube.utils.cog import write_cog

sys.path.append('../Scripts')
from dea_plotting import rgb

Connect to the datacube

[2]:
dc = datacube.Datacube(app='Exporting_GeoTIFFs')

Load Landsat 8 data from the datacube

Here we are loading in a timeseries of Landsat 8 satellite images through the datacube API. This will provide us with some data to work with.

[3]:
# Create a query object
query = {
    'x': (153.40, 153.45),
    'y': (-28.85, -28.90),
    'time': ('2018-01', '2018-03'),
    'measurements': ['nbart_red', 'nbart_green', 'nbart_blue'],
    'output_crs': 'EPSG:3577',
    'resolution': (-30, 30),
    'group_by': 'solar_day'
}

# Load available data from the Landsat 8
ds = dc.load(product='ga_ls8c_ard_3', **query)

# Print output data
print(ds)

<xarray.Dataset>
Dimensions:      (time: 6, x: 191, y: 212)
Coordinates:
  * time         (time) datetime64[ns] 2018-01-03T23:42:39.393181 ... 2018-03...
  * y            (y) float64 -3.307e+06 -3.307e+06 ... -3.313e+06 -3.313e+06
  * x            (x) float64 2.053e+06 2.053e+06 ... 2.058e+06 2.058e+06
    spatial_ref  int32 3577
Data variables:
    nbart_red    (time, y, x) int16 411 422 404 396 509 ... 244 285 307 251 247
    nbart_green  (time, y, x) int16 625 640 595 580 697 ... 343 407 470 372 371
    nbart_blue   (time, y, x) int16 435 430 421 412 542 ... 273 308 320 274 270
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Plot an rgb image to confirm we have data

The white regions are cloud cover.

[4]:
rgb(ds, index=0, percentile_stretch=(0.1, 0.9))

../../_images/notebooks_Frequently_used_code_Exporting_GeoTIFFs_11_0.png

Export GeoTIFFs

Single-band, single time-slice data

This method uses the datacube.utils.cog function write_cog (where COG stands for Cloud Optimised GeoTIFF) to export a simple single-band, single time-slice GeoTIFF file. A few important caveats should be noted when using this function:

  1. It requires an xarray.DataArray; supplying an xarray.Dataset will return an error. To convert an xarray.Dataset to an xarray.DataArray run the following:

da = ds.to_array()
  1. This function generates a temporary in-memory GeoTIFF file without compression. This means the function will temporarily use about 1.5 to 2 times the memory of the input xarray.DataArray

[5]:
# Select a single time-slice and a single band from the dataset.
singleband_da = ds.nbart_red.isel(time=0)

# Write GeoTIFF to a location
write_cog(geo_im=singleband_da,
          fname='red_band.tif',
          overwrite=True)
[5]:
PosixPath('red_band.tif')

Multi-band, single time-slice data

Here we select a single time and export all the bands in the dataset using write_cog:

[6]:
# Select a single time-slice
rgb_da = ds.isel(time=0).to_array()

# Write multi-band GeoTIFF to a location
write_cog(geo_im=rgb_da,
          fname='rgb.tif',
          overwrite=True)
[6]:
PosixPath('rgb.tif')

Multi-band, multiple time-slice data

If we want to export all of the time steps in a dataset, we can wrap write_cog in a for-loop and export each time slice as an individual GeoTIFF file:

[7]:
for i in range(len(ds.time)):

    # We will use the date of the satellite image to name the GeoTIFF
    date = ds.isel(time=i).time.dt.strftime('%Y-%m-%d').data
    print(f'Writing {date}')

    # Convert current time step into a `xarray.DataArray`
    singletimestamp_da = ds.isel(time=i).to_array()

    # Write GeoTIFF
    write_cog(geo_im=singletimestamp_da,
              fname=f'{date}.tif',
              overwrite=True)
Writing 2018-01-03
Writing 2018-01-19
Writing 2018-02-04
Writing 2018-02-20
Writing 2018-03-08
Writing 2018-03-24

Advanced

Passing custom rasterio parameters

By default, write_cog will read attributes like nodata directly from the attributes of the data itself. However, write_cog also accepts any parameters that could be passed to the underlying `rasterio.open <https://rasterio.readthedocs.io/en/latest/api/rasterio.io.html>`__ function, so these can be used to override the default attribute values.

For example, it can be useful to provide a new nodata value, for instance if data is transformed to a different dtype or scale and the original nodata value is no longer appropriate. By default, write_cog will use a nodata value of -999 as this is what is stored in the dataset’s attributes:

[8]:
# Select a single time-slice and a single band from the dataset.
singleband_da = ds.nbart_red.isel(time=0)

# Print nodata attribute value
print(singleband_da.nodata)
-999

To override this value and use a nodata value of 0 instead:

[9]:
# Write GeoTIFF
write_cog(geo_im=singleband_da,
          fname='custom_nodata.tif',
          overwrite=True,
          nodata=0.0)
[9]:
PosixPath('custom_nodata.tif')

We can verify the nodata value is now set to 0.0 by reading the data back in with rasterio:

[10]:
with rasterio.open('custom_nodata.tif') as geotiff:
    print(geotiff.nodata)
0.0

Unsetting nodata for float datatypes containing NaN

A common analysis workflow is to load data from the datacube, then mask out certain values by setting them to NaN. For example, we may use the .where() method to set all -999 nodata values in our data to NaN:

Note: The mask_invalid_data function from datacube.utils.masking can also be used to automatically set nodata values from the data’s attributes to NaN. This function will also automatically drop the nodata attribute from the data, removing the need for the steps below. For more information, see the Masking data notebook.

[11]:
# Select a single time-slice and a single band from the dataset.
singleband_da = ds.nbart_red.isel(time=0)

# Set -999 values to NaN
singleband_masked = singleband_da.where(singleband_da != -999)
print(singleband_masked)
<xarray.DataArray 'nbart_red' (y: 212, x: 191)>
array([[ 411.,  422.,  404., ...,  933.,  670.,  544.],
       [ 533.,  539.,  530., ...,  812.,  666.,  751.],
       [ 567.,  609.,  788., ...,  907.,  840.,  829.],
       ...,
       [ 269.,  308.,  412., ...,  319.,  306.,  314.],
       [ 278.,  286.,  428., ...,  321.,  322.,  326.],
       [ 304.,  560., 1339., ...,  332.,  322.,  312.]])
Coordinates:
    time         datetime64[ns] 2018-01-03T23:42:39.393181
  * y            (y) float64 -3.307e+06 -3.307e+06 ... -3.313e+06 -3.313e+06
  * x            (x) float64 2.053e+06 2.053e+06 ... 2.058e+06 2.058e+06
    spatial_ref  int32 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

In this case, since we have replaced -999 nodata values with NaN, the data’s nodata attribute is no longer valid. Before we write our data to file, we should therefore remove the nodata attribute from our data:

[12]:
# Remove nodata attribute
singleband_masked.attrs.pop('nodata')

# Write GeoTIFF
write_cog(geo_im=singleband_masked,
          fname='masked_data.tif',
          overwrite=True)
[12]:
PosixPath('masked_data.tif')

If we read this GeoTIFF back in with rasterio, we will see that it no longer has a nodata attribute set:

[13]:
with rasterio.open('masked_data.tif') as geotiff:
    print(geotiff.nodata)
None

Exporting GeoTIFFs from a dask array

Note: For more information on using dask, refer to the Parallel processing with Dask notebook

If you pass a lazily-loaded dask array into the function, write_cog will not immediately output a GeoTIFF, but will instead return a dask.delayed object:

[14]:
# Lazily load data using dask
ds_dask = dc.load(product='ga_ls8c_ard_3',
                  dask_chunks={},
                  **query)

# Run `write_cog`
ds_delayed = write_cog(geo_im=ds_dask.isel(time=0).to_array(),
                       fname='dask_geotiff.tif',
                       overwrite=True)

# Print dask.delayed object
print(ds_delayed)
Delayed('_write_cog-6da1c71e-9990-4cf5-aac9-8146108da04e')

To trigger the GeoTIFF to be exported to file, run .compute() on the dask.delayed object. The file will now appear in the file browser to the left.

[15]:
ds_delayed.compute()
[15]:
PosixPath('dask_geotiff.tif')

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: September 2020

Compatible datacube version:

[16]:
print(datacube.__version__)
1.8.3

Tags

Browse all available tags on the DEA User Guide’s Tags Index

Tags: sandbox compatible, NCI compatible, GeoTIFF, Cloud Optimised GeoTIFF, dc.load, landsat 8, datacube.utils.cog, write_cog, exporting data