Subpixel waterbodies c558726b5bc34d139ce777227edd874e


DEA Waterbodies uses WOfS, a classification of Landsat pixels into wet and dry, to identify and track waterbodies throughout the Australian continent. One limitation is the size of the Landsat pixels, 25 m x 25 m, which can be quite large compared to the size of some waterbodies. Can we identify the maximum extent of a waterbody better than a whole pixel approach, as in e.g. Bishop-Taylor et al. (2019; or Sall et al. (2020; How would this affect our time series?

This notebook takes an existing DEA Waterbodies polygon, refines it using subpixel extents, and recalculates the time series using the subpixel extents for comparison.

Getting started

Specify the geohash of the polygon you want to refine in the “Configuration” section, and run all cells to do the analysis.

Load packages

%matplotlib inline

import sys
import datacube
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
import shapely
from tqdm.notebook import tqdm
from affine import Affine
from odc.ui import with_ui_cbk

from dea_spatialtools import xr_vectorize, subpixel_contours


Specify the geohash of the waterbody to evaluate:

geohash = "r3dn23tun"  # Point Hut Pond

Connect to the datacube

Connect to the datacube so we can access DEA data. The app parameter is a unique name for the analysis which is based on the notebook file name.

dc = datacube.Datacube(app="Subpixel-waterbodies")

Load the DEA Waterbodies polygons

These are hosted here. Download them and unzip into the same directory as this notebook.

dea_wbs = gpd.read_file("DigitalEarthAustraliaWaterbodies.shp")

Then we can load the polygon associated with the waterbody we selected.

wb = dea_wbs.set_index("UID").loc[geohash]
wb = gpd.GeoDataFrame([wb],

Load the WOfS summary

The WOfS summary is used to define the waterbody maximum extent.

wofs = dc.load(
fig, ax = plt.subplots(1, 1)
<matplotlib.collections.QuadMesh at 0x7ffa877e2fd0>

Reproduce DEA Waterbodies

Find the 5% and 10% polygons, discard any under 5 px, and then do a spatial join.

pc_10 = xr_vectorize(wofs.frequency >= 0.10,
pc_05 = xr_vectorize(wofs.frequency >= 0.05,

pc_10 = pc_10[pc_10.attribute == 1]
pc_05 = pc_05[pc_05.attribute == 1]

# Discard polygons with under 5 px.
pc_05 = pc_05[pc_05.area >= 25 ** 2]
pc_10 = pc_10[pc_10.area >= 25 ** 2]

# Then join.
joined = gpd.sjoin(pc_05, pc_10, lsuffix="05", rsuffix="10", how="right")

# Allow a 5% polygon as long as it intersects with a 10% polygon.
ok_05 = set(joined.index_05)
pc_05 = pc_05.loc[ok_05]

wb_pixels = pc_05.reset_index(drop=True)
/env/lib/python3.6/site-packages/pyproj/crs/ FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes:
  return _prepare_from_string(" ".join(pjargs))
fig, axs = plt.subplots(1, 2)
wb_pixels.plot(ax=axs[1], edgecolor="k", facecolor="None")
wb.plot(ax=axs[0], edgecolor="k", facecolor="None")
axs[0].set_title("DEA Waterbodies")
axs[1].set_title("This notebook")

Subpixel extent

Using subpixel_contours, find the subpixel extents of the waterbody. We can set the contour levels to 5% and 10% as above, but we can also change them to other values. This modifies the eventual error in the boundary quite dramatically for steep-edged lakes.

contours = subpixel_contours(wofs.frequency, z_values=[0.05, 0.10]).set_index(
    "z_value", drop=True

pc_05 = gpd.GeoDataFrame(
    geometry=[shapely.geometry.Polygon(ls) for ls in contours.iloc[0].geometry]
pc_10 = gpd.GeoDataFrame(
    geometry=[shapely.geometry.Polygon(ls) for ls in contours.iloc[1].geometry]

# Determine whether each polygon is an "inner" polygon or not.
pc_10_agg = pc_10.geometry[0]
for g in pc_10.geometry[1:]:
    pc_10_agg = pc_10_agg.symmetric_difference(g)

pc_05_agg = pc_05.geometry[0]
for g in pc_05.geometry[1:]:
    pc_05_agg = pc_05_agg.symmetric_difference(g)

pc_05 = gpd.GeoDataFrame(geometry=list(pc_05_agg))
pc_10 = gpd.GeoDataFrame(geometry=list(pc_10_agg))

# Discard anything under 5 px.
pc_05 = pc_05[pc_05.area >= 25 ** 2]
pc_10 = pc_10[pc_10.area >= 25 ** 2]

# Spatial join to find 5% extents for 10% polygons.
joined = gpd.sjoin(pc_05, pc_10, lsuffix="05", rsuffix="10", how="right")
pc_05 = pc_05.loc[joined.index_05]

wb_subpixel = pc_05.set_crs("EPSG:3577")
Operating in multiple z-value, single array mode

Combine both sets with a spatial join.

wb_joined = gpd.sjoin(wb_pixels, wb_subpixel, how="left")
# Add the geometry back in.
wb_joined = wb_joined.join(wb_pixels, rsuffix="_pixel").join(
    wb_subpixel, on="index_right", rsuffix="_subpixel"

Finally, filter to only include the polygon that maximally overlaps with the DEA Waterbodies polygon.

biggest_overlap = np.argmax(
    [wb.geometry[0].intersection(g).area for g in wb_joined.geometry_pixel]
wb_joined = gpd.GeoDataFrame([wb_joined.iloc[biggest_overlap]])

Shape comparison

Let’s look at the shape of the waterbody.

fig, ax = plt.subplots(figsize=(10, 10))
    ax=ax, edgecolor="blue", facecolor="None"
    ax=ax, edgecolor="red", facecolor="None"
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa876ea588>

Exporting the subpixel polygon

We can export the subpixel polygon as a GeoJSON file for later review, e.g. for loading into QGIS.

gpd.GeoDataFrame(geometry=wb_joined.geometry_subpixel, crs='EPSG:3577').to_file(f'{geohash}_subpixel.geojson', driver='GeoJSON')

Extracting time series

Now we will generate a surface area time series using both extents.

Load WOfS

Load WOfS for all available times from 2015 to 2020.

wofs_daily = dc.load(
    time=("2015-01", "2020-01"),

Rasterise the waterbodies

Rasterise the pixel and subpixel waterbodies. To rasterise the subpixel waterbodies while maintaining their subpixel nature, we’ll use a prototype of a partial-pixel rasterising algorithm that was suggested to be added to rasterio, but wasn’t ever added.

def _rasterize_geom(geom, shape, affinetrans, all_touched):
    from rasterio import features

    indata = [(geom, 1)]
    rv_array = features.rasterize(
        indata, out_shape=shape, transform=affinetrans, fill=0, all_touched=all_touched
    return rv_array

def rasterize_pctcover(geom, atrans, shape):
    import rasterio
    import fiona
    import numpy as np
    from shapely.geometry import box

    alltouched = _rasterize_geom(geom, shape, atrans, all_touched=True)
    exterior = _rasterize_geom(geom.exterior, shape, atrans, all_touched=True)

    # Create percent cover grid as the difference between them
    # at this point all cells are known 100% coverage,
    # we'll update this array for exterior points
    pctcover = 100 * (alltouched - exterior)

    # loop through indicies of all exterior cells
    for r, c in zip(*np.where(exterior == 1)):

        # Find cell bounds, from rasterio DatasetReader.window_bounds
        window = ((r, r + 1), (c, c + 1))
        ((row_min, row_max), (col_min, col_max)) = window
        x_min, y_min = atrans * (col_min, row_max)
        x_max, y_max = atrans * (col_max, row_min)
        bounds = (x_min, y_min, x_max, y_max)

        # Construct shapely geometry of cell
        cell = box(*bounds)

        # Intersect with original shape
        cell_overlap = cell.intersection(geom)

        # update pctcover with percentage based on area proportion
        coverage = cell_overlap.area / cell.area
        pctcover[r, c] = int(coverage * 100)

    return pctcover
raster_pixel = (
        wb_joined.iloc[0].geometry_pixel, wofs.geobox.transform, wofs.frequency.shape
    / 100
raster_subpixel = (
        wb_joined.iloc[0].geometry_subpixel, wofs.geobox.transform, wofs.frequency.shape
    / 100

Count wet pixels

For each WOfS observation, multiply the wet pixel mask by the raster mask and sum the result to obtain the wet surface area.

px_wet_pixel = ((wofs_daily == 128) * raster_pixel).sum(axis=(1, 2)).water
px_wet_subpixel = ((wofs_daily == 128) * raster_subpixel).sum(axis=(1, 2)).water

Invalid days are those with >10% missing observations.

missing = (wofs_daily != 128) & (wofs_daily != 0)
px_missing_pixel = (missing * raster_pixel).sum(axis=(1, 2)).water
px_missing_subpixel = (missing * raster_subpixel).sum(axis=(1, 2)).water
invalid_pixel = px_missing_pixel > 0.1 * raster_pixel.sum()
invalid_subpixel = px_missing_subpixel > 0.1 * raster_subpixel.sum()

Plot the time series

time_series = pd.DataFrame(
        "pixel": px_wet_pixel,
        "subpixel": px_wet_subpixel,
        "invalid_pixel": invalid_pixel,
        "invalid_subpixel": invalid_subpixel,
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa874b8198>
(time_series.pixel - time_series.subpixel)[
    ~(time_series.invalid_pixel | time_series.invalid_subpixel)
].plot(label="residual", legend=True)
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa8716fa20>

Of course, there is not much difference in the time series for any waterbody of more than a few pixels.

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: October 2020

Compatible datacube version:



Browse all available tags on the DEA User Guide’s Tags Index

Tags: NCI compatible, sandbox compatible, time series, water, waterbodies

[ ]: