# Image segmentation¶

## Background¶

In the last two decades, as the spatial resolution of satellite images has increased, remote sensing has begun to shift from a focus on pixel-based analysis towards Geographic Object-Based Image Analysis (GEOBIA), which aims to group pixels together into meaningful image-objects. There are two advantages to a GEOBIA worklow; one, we can reduce the ‘salt and pepper’ effect typical of classifying pixels; and two, we can increase the computational efficiency of our workflow by grouping pixels into fewer, larger, but meaningful objects. A review of the emerging trends in GEOBIA can be found in Chen et al. (2017).

## Description¶

This notebook demonstrates two methods for conducting image segmentation, which is a common image analysis technique used to transform a digital satellite image into objects. In brief, image segmentation aims to partition an image into segments, where each segment consists of a group of pixels with similar characteristics. A number of algorithms exist to perform image segmentation, two of which are shown here:

1. Quickshift, implemented through the python package scikit-image

2. Shepherd Segmentation, implemented through the package rsgislib

Note: Image segmentation at very large scales can be both time and memory intensive, and the examples shown here will become prohibitively time consuming at scale. The notebook Tiled, Parallel Image Segmentation builds upon the image segmentation algorithm developed by Shepherd et al. (2019) to run image segmentation across multiple cpus.

## Getting started¶

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

[1]:

%matplotlib inline

import sys
import datacube
import xarray as xr
import numpy as np
import scipy
import matplotlib.pyplot as plt
from osgeo import gdal
from datacube.helpers import write_geotiff
from skimage.segmentation import quickshift
from rsgislib.segmentation import segutils

sys.path.append('../Scripts')
from dea_datahandling import array_to_geotiff
from dea_plotting import rgb
from dea_bandindices import calculate_indices


/env/lib/python3.6/site-packages/datacube/storage/masking.py:4: DeprecationWarning: datacube.storage.masking has moved to datacube.utils.masking
category=DeprecationWarning)


### Connect to the datacube¶

[2]:

dc = datacube.Datacube(app='Image_segmentation')



## Load Sentinel 2 data from the datacube¶

Here we are loading in a timeseries of Sentinel 2 satellite images through the datacube API using the load_ard function. This will provide us with some data to work with.

[3]:

# Create a query object
query = {
'x': (153.35, 153.50),
'y': (-28.80, -28.95),
'time': ('2018-01', '2018-03'),
'measurements': ['nbart_red', 'nbart_nir_1'],
'output_crs': 'EPSG:3577',
'resolution': (-30, 30),
'group_by': 'solar_day'
}

# Load available data from all three Landsat satellites
products=['s2a_ard_granule', 's2b_ard_granule'],
**query)

# Print output data
print(ds)


Loading s2a_ard_granule data
Combining and sorting data
Returning 18 observations
<xarray.Dataset>
Dimensions:      (time: 18, x: 570, y: 634)
Coordinates:
spatial_ref  int32 0
* y            (y) float64 -3.3e+06 -3.3e+06 ... -3.319e+06 -3.319e+06
* x            (x) float64 2.047e+06 2.047e+06 ... 2.064e+06 2.064e+06
* time         (time) datetime64[ns] 2018-01-01T23:52:39.027000 ... 2018-03-27T23:52:41.026000
Data variables:
nbart_red    (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
nbart_nir_1  (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
crs:           EPSG:3577
grid_mapping:  spatial_ref


## Combine observations into a noise-free statistical summary image¶

Individual remote sensing images can be affected by noisy and incomplete data (e.g. due to clouds). To produce cleaner images that we can feed into the image segmentation algorithms, we can create summary images, or composites, that combine multiple images into one image to reveal the ‘typical’ appearance of the landscape for a certain time period. In the code below, we take the noisy, incomplete satellite images we just loaded and calculate the mean Normalised Difference Vegetation Index (NDVI). The mean NDVI will be our input into the segmentation algorithms. We will write the NDVI composite to a geotiff, as the Shepherd Segmentation runs on disk.

### Calculate mean NDVI¶

[4]:

# First we calculate NDVI on each image in the timeseries
ndvi = calculate_indices(ds, index='NDVI', collection='ga_s2_1')

# For each pixel, calculate the mean NDVI throughout the whole timeseries
ndvi = ndvi.mean(dim='time', keep_attrs=True)

# Plot the results to inspect
ndvi.NDVI.plot(vmin=0.1, vmax=1.0, cmap='gist_earth_r', figsize=(10, 10))


/env/lib/python3.6/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis=axis, dtype=dtype)

[4]:

<matplotlib.collections.QuadMesh at 0x7f6157d977f0>


## Quickshift Segmentation¶

Using the function quickshift from the python package scikit-image, we will conduct an image segmentation on the mean NDVI array. We then calculate a zonal mean across each segment using the input dataset. Our last step is to export our results as a GeoTIFF.

Follow the quickshift hyperlink above to see the input parameters to the algorithm, and the following link for an explanation of quickshift and other segmentation algorithms in scikit-image.

[5]:

# Convert our mean NDVI xarray into a numpy array, we need
# to be explicit about the datatype to satisfy quickshift
input_array = ndvi.NDVI.values.astype(np.float64)


[6]:

# Calculate the segments
segments = quickshift(input_array,
kernel_size=5,
convert2lab=False,
max_dist=10,
ratio=1.0)


[7]:

# Calculate the zonal mean NDVI across the segments
segments_zonal_mean_qs = scipy.ndimage.mean(input=input_array,
labels=segments,
index=segments)


[8]:

# Plot to see result
plt.figure(figsize=(10,10))
plt.imshow(segments_zonal_mean_qs, cmap='gist_earth_r', vmin=0.1, vmax=1.0)
plt.colorbar(shrink=0.9)


[8]:

<matplotlib.colorbar.Colorbar at 0x7f6158002390>


### Export result to GeoTIFF¶

[9]:

transform = ds.geobox.transform.to_gdal()
projection = ds.geobox.crs.wkt

# Export the array
array_to_geotiff('segmented_meanNDVI_QS.tif',
segments_zonal_mean_qs,
geo_transform=transform,
projection=projection,
nodata_val=np.nan)



## Shepherd Segmentation¶

Here we conduct an image segmentation on the mean NDVI array using the runShepherdSegmentation function from rgislib. This scalable segmentation algorithm is seeded using k-means clustering, and can enforce a minimum mapping unit size through an iterative clumping and elimination process (Shepherd et al. 2019). This function will output a .kea file containing the segmented image, along with a segmented GeoTIFF where the segments are attributed with the zonal mean of the input GeoTIFF (in this case, NDVI).

To better understand the parameters of the runShepheredSegmentation algorithm, read here.

The cell directly below sets up the inputs to the runShepherdSegmentation function:

[10]:

# Name of the GeoTIFF to export then and segment
tiff_to_segment = 'meanNDVI.tif'

# Name of the .kea file the GeoTIFF will be converted too
kea_file = 'meanNDVI.kea'

# Name of the segmented .kea file that will be output
segmented_kea_file = 'meanNDVI_segmented.kea'

# Name of the segmented .kea file attributed with the zonal mean of input file
segments_zonal_mean = 'segments_zonal_mean_shepherdSeg.kea'



We then write out our mean NDVI dataset to a GeoTIFF, and convert the GeoTIFF to a .kea file using gdal.Translate so it can be read by the runShepherdSegmentation function (the .kea file format provides a full implementation of the GDAL data model and is implemented within a HDF5 file):

[11]:

# Write the mean NDVI dataset to be written to file as a GeoTIFF
write_geotiff(filename=tiff_to_segment, dataset=ndvi[['NDVI']])

# Convert the GeoTIFF into a KEA file format
gdal.Translate(destName=kea_file,
srcDS=tiff_to_segment,
format='KEA',
outputSRS='EPSG:3577')


/env/lib/python3.6/site-packages/datacube/helpers.py:34: DeprecationWarning: Function datacube.helpers.write_geotiff is deprecated,
category=DeprecationWarning)

[11]:

<osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x7f61546f1b40> >


We can then perform the segmentation on the .kea file:

[12]:

# Run the image segmentation
segutils.runShepherdSegmentation(inputImg=kea_file,
outputClumps=segmented_kea_file,
outputMeanImg=segments_zonal_mean,
numClusters=20,
minPxls=200)


Stretch Input Image
Add 1 to stretched file to ensure there are no all zeros (i.e., no data) regions created.
Deleting file: ./meanNDVI_stchdonly.kea
Deleting file: ./meanNDVI_stchdonlyOff.kea
Performing KMeans.
Apply KMeans to image.
Eliminate Single Pixels.
Perform clump.
Eliminate small pixels.
Relabel clumps.
Calculate image statistics and build pyramids.
Deleting file: ./meanNDVI_kmeansclusters.gmtxt
Deleting file: ./meanNDVI_kmeans.kea
Deleting file: ./meanNDVI_kmeans.kea.aux.xml
Deleting file: ./meanNDVI_kmeans_nosgl.kea
Deleting file: ./meanNDVI_kmeans_nosglTMP.kea
Deleting file: ./meanNDVI_clumps.kea
Deleting file: ./meanNDVI_clumps_elim.kea
Deleting file: ./meanNDVI_stchd.kea

[13]:

# Open and plot the segments attributed with zonal mean NDVI
result = xr.open_rasterio(segments_zonal_mean)
result.plot(vmin=0.1, vmax=1.0, figsize=(10, 10), cmap='gist_earth_r')


[13]:

<matplotlib.collections.QuadMesh at 0x7f6157eebac8>


Note that we can also open the output segmented_kea_file .kea file using xarray.open_rasterio to view the raw unattributed segments:

[14]:

xr.open_rasterio(segmented_kea_file).plot()

[14]:

<matplotlib.collections.QuadMesh at 0x7f6154060390>


Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[15]:

print(datacube.__version__)

1.8.0b7.dev35+g5023dada


## Tags¶

Browse all available tags on the DEA User Guide’s Tags Index

Tags: sandbox compatible, sentinel 2, dea_datahandling, dea_plotting, dea_bandindices, array_to_geotiff, load_ard, rgb, calculate_indices, image segmentation, image compositing, rsgislib, scikit-image, GEOBIA, quickshift, NDVI, GeoTIFF, exporting data