# Generating geometric median composites (geomedians)¶

• Compatability: Notebook currently compatible with both the NCI and DEA Sandbox environments

• Products used: s2a_ard_granule, s2b_ard_granule

## Background¶

Individual remote sensing images can be affected by noisy data, including clouds, cloud shadows, and haze. To produce cleaner images that can be compared more easily across time, we can create ‘summary’ images or ‘composites’ that combine multiple images into one image to reveal the median or ‘typical’ appearance of the landscape for a certain time period.

One approach to doing this is to create a geomedian. A geomedian is based on a high-dimensional statistic called the ‘geometric median’ (Small 1990), which effectively trades a temporal stack of poor quality observations for a single high-quality pixel composite with reduced spatial noise (Roberts et al. 2017). In contrast to a standard median, a geomedian maintains the relationship between spectral bands. This allows us to conduct further analysis on the composite images just as we would on the original satellite images (e.g by allowing the calculation of common band indices like NDVI).

## Description¶

In this notebook we will take of time series of noisy satellite images collected over a year and calculate an annual geomedian composite which is largely free of clouds and other noisy data.

## Getting started¶

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

[1]:

%matplotlib inline

import datacube
from datacube_stats.statistics import GeoMedian
import sys

sys.path.append('../Scripts')
from dea_plotting import rgb


### Connect to the datacube¶

[2]:

dc = datacube.Datacube(app='Geomedian_composites')


## Load Sentinel 2 data from the datacube¶

Here we are loading in a timeseries of cloud-masked Sentinel 2 satellite images through the datacube API using the load_ard function. This will provide us with some data to work with.

[3]:

# Create a query object
query = {
'x': (153.35, 153.50),
'y': (-28.80, -28.95),
'time': ('2018-01', '2018-12'),
'measurements': ['nbart_green', 'nbart_red', 'nbart_blue'],
'output_crs': 'EPSG:3577',
'resolution': (-30, 30),
'group_by': 'solar_day'
}

products=['s2a_ard_granule', 's2b_ard_granule'],
**query)

# Print output data
print(ds)


Loading s2a_ard_granule data
Combining and sorting data
Returning 70 observations
<xarray.Dataset>
Dimensions:      (time: 70, x: 570, y: 634)
Coordinates:
* y            (y) float64 -3.3e+06 -3.3e+06 ... -3.319e+06 -3.319e+06
* x            (x) float64 2.047e+06 2.047e+06 ... 2.064e+06 2.064e+06
* time         (time) datetime64[ns] 2018-01-01T23:52:39.027000 ... 2018-12-27T23:52:39.024000
Data variables:
nbart_green  (time, y, x) float32 nan nan nan nan ... 250.0 202.0 246.0
nbart_red    (time, y, x) float32 nan nan nan nan ... 180.0 138.0 85.0 149.0
nbart_blue   (time, y, x) float32 nan nan nan nan ... 327.0 278.0 326.0
Attributes:
crs:      EPSG:3577


## Plot timesteps in true colour¶

To visualise the data, use the pre-loaded rgb utility function to plot a true colour image for a series of timesteps. White areas indicate where clouds or other invalid pixels in the image have been masked.

The code below will plot the first and last 4 timesteps of the time series we just loaded.

[4]:

# Set the timesteps to visualise, this
timesteps = [1, 2, 3, 4, -1, -2, -3, -4]

# Generate RGB plots at each timestep
rgb(ds, index=timesteps)



## Generate a geomedian¶

As you can see above, most satellite images will have some areas masked out due to clouds or some other interference between the satellite and the ground. Generating a geomedian composite will combine all the observations in our xarray.Dataset into a single, complete (or near complete) image.

[5]:

# Compute geomedian using all observations in the dataset
geomedian = GeoMedian().compute(ds)


If we print our result, you will see that the time dimension has now been removed and we are left with a single image that represents the geometric median of all the satellite images in our intial time series.

[6]:

print(geomedian)

<xarray.Dataset>
Dimensions:      (x: 570, y: 634)
Coordinates:
* y            (y) float64 -3.3e+06 -3.3e+06 ... -3.319e+06 -3.319e+06
* x            (x) float64 2.047e+06 2.047e+06 ... 2.064e+06 2.064e+06
Data variables:
nbart_green  (y, x) float32 785.4065 812.5953 ... 325.29797 324.61264
nbart_red    (y, x) float32 757.37256 755.6008 ... 159.50264 158.18187
nbart_blue   (y, x) float32 511.48798 522.9589 ... 367.49963 366.0602
Attributes:
crs:      EPSG:3577


## Plot the geomedian composite¶

Plotting the result, we can see that the geomedian image is much more complete than any of the individual images. We can also use this data in downstream analysis as the relationships between the spectral bands are maintained by the geometric median statistic.

Note: You may notice that there are still some holes in our geomedian composite (white clusters of pixels). This is largely due to Fmask (the current default cloud masking algorithm) being overzealous in removing pixels when masking Sentinel 2 satellite images. This is a known limitation of our current cloud masking approach and a new method for cloud masking of Sentinel 2 data is currently in development.

[7]:

# Plot the result
rgb(geomedian, size=10)



Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[8]:

print(datacube.__version__)

1.7


## Tags¶

Browse all available tags on the DEA User Guide’s Tags Index

Tags: NCI compatible, sandbox compatible, sentinel 2, dea_plotting, dea_datahandling, load_ard, rgb, geomedian, image compositing