Tiled, Parallel Image Segmentation

What does this notebook do?

Image segmentation at large scales can be both time and memory intensive. The module ‘tiledSegParallel.py’ builds upon the image segmentation algorithm developed by Shepherd et al. (2019) (implemented in the package RSGISlib) to run image segmentation across multiple cpus. A full description of their approach can be found in Clewey et al. (2014) A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. This notebook implements the ‘tiledSegParallel.py’ module to conduct an image segmentation on a large geotiff (8gb, approximately 50 Albers tiles). The results of the notebook include a segments (‘Clumps’) geotiff, and a geotiff with the segments attributed with the zonal mean of the input geotiff. This script takes ~ 1.5 hours to run on a geotiff of this scale.

Requirements

You need to run the following commands from the command line prior to launching jupyter notebooks from the same terminal so that the required libraries and paths are set:

module use /g/data/v10/public/modules/modulefiles
module load dea

This script also requires the installation of ‘pathos.multiprocessing’, a fork of python’s Multiprocessing package that using Dill instead of Pickle for serializing.

pip install --user pathos

Cautions

There are two major caveats to the use of this script: 1. As the script uses the Multiprocessing library, it cannot be run across multiple nodes. If running on Raijin, using queues that have a large number of cpus per node would be best. A ‘raijinified’ version of the script can be found here. 2. The tiling approach is based on the bounding coordinates of the geotiff. If a geotiff is irregularly shaped such that a tile(s) contains none of the input geotiff, then the segmentation will fail. If this occurs, check the …S1Tiles.shp file output during stage 1 of the algorithm. Overlay this file on top of your input geotiff to check if there are empty tiles. At the moment, the only solution is to change the extent of the geotiff to be more regularly shaped. The ‘validDataTileFraction’ variable will handle tiles that contain a small fraction of the input geotiff, tiles containing less than the specified fraction are merged with a neighboring tile. The image below shows an example of the tiling approach with merged tiles:

Image of tilingapproach

Date 3/6/19

Author Chad Burton

Tags: image segmentation, RSGISlib, parallel, multiprocessing, object-orientated

User Inputs

[ ]:
# Location string of the geotiff you wish to segment
inputTiff = "data/nmdb_Summer2017_18_NDVI_max.tif"

# Location string of the .KEA file the geotiff will be converted too
InputKEA = "data/nmdb_Summer2017_18_NDVI_max.kea"

# Location string of clumps .KEA file that will be output
ClumpsFile = "results/nmdb_Summer2017_18_NDVI_max_OutClumps.kea"

# Location string of the Clump zonal mean of input tiff
meanImageTiff = "results/nmdb_Summer2017_18_NDVI_max_ClumpMean.tif"

# Location to a folder to store temporary files during segmentation
temp = 'tmps/'

# How many cpus will this run on? There are 8 CPUS on a VDI instance,
# 5 cpus leaves enough processing power for other work while this script runs.
ncpus=5

# what fraction of a tile should contain valid data? Below this threshold
# a tile will be merged with its neighbour.
validDataTileFraction = 0.3

# enter the tile size parameters (in number of pixels)
width = 8000
height = 8000

Run the cells below to conduct the image segmentation

[ ]:
from osgeo import gdal
import os
import osr
from rsgislib.segmentation import meanImage
import rsgislib
from pathos.multiprocessing import ProcessingPool as Pool
import dill
import xarray as xr
import numpy as np
#import custom functions
import sys
sys.path.append('../10_Scripts')
import tiledSegParallel
from SpatialTools import geotransform
from SpatialTools import array_to_geotiff
[ ]:
# Change the tiff to a kea file (only run this once!)
gdal.Translate(InputKEA, inputTiff, format='KEA', outputSRS='EPSG:3577')
[ ]:
#run the segmentation
tiledSegParallel.performTiledSegmentation(InputKEA, ClumpsFile, tmpDIR=temp, numClusters=20, validDataThreshold=validDataTileFraction,
                                          tileWidth=width, tileHeight=height, minPxls=100, ncpus=ncpus)
[ ]:
# Attribute segments with zonal mean of input image and output as geotiff
meanImage(inputTiff, ClumpsFile, meanImageTiff, "GTIFF",rsgislib.TYPE_32FLOAT)
[ ]:
#convert segments result into geotiff.
a = xr.open_rasterio(ClumpsFile).squeeze()
transform, projection = geotransform(a, (a.x, a.y), epsg=3577)
width,height = a.shape
array_to_geotiff(ClumpsFile[:-4]+".tif",
      a.values, geo_transform = transform,
      projection = projection,
      nodata_val=-999)

Results

The picture below shows the results of an image segmentation, with the segments coloured by the zonal mean of the input geotiff (NDVI in this case).

imageSeg results