Tiled, Parallel Image Segmentation¶
What does this notebook do?
Image segmentation at large scales can be both time and memory intensive. The module ‘tiledSegParallel.py’ builds upon the image segmentation algorithm developed by Shepherd et al. (2019) (implemented in the package RSGISlib) to run image segmentation across multiple cpus. A full description of their approach can be found in Clewey et al. (2014) A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. This notebook implements the ‘tiledSegParallel.py’ module to conduct an image segmentation on a large geotiff (8gb, approximately 50 Albers tiles). The results of the notebook include a segments (‘Clumps’) geotiff, and a geotiff with the segments attributed with the zonal mean of the input geotiff. This script takes ~ 1.5 hours to run on a geotiff of this scale.
You need to run the following commands from the command line prior to launching jupyter notebooks from the same terminal so that the required libraries and paths are set:
module use /g/data/v10/public/modules/modulefiles module load dea
This script also requires the installation of ‘pathos.multiprocessing’, a fork of python’s Multiprocessing package that using Dill instead of Pickle for serializing.
pip install --user pathos
There are two major caveats to the use of this script: 1. As the script uses the Multiprocessing library, it cannot be run across multiple nodes. If running on Raijin, using queues that have a large number of cpus per node would be best. A ‘raijinified’ version of the script can be found here. 2. The tiling approach is based on the bounding coordinates of the geotiff. If a geotiff is irregularly shaped such that a tile(s) contains none of the input geotiff, then the segmentation will fail. If this occurs, check the …S1Tiles.shp file output during stage 1 of the algorithm. Overlay this file on top of your input geotiff to check if there are empty tiles. At the moment, the only solution is to change the extent of the geotiff to be more regularly shaped. The ‘validDataTileFraction’ variable will handle tiles that contain a small fraction of the input geotiff, tiles containing less than the specified fraction are merged with a neighboring tile. The image below shows an example of the tiling approach with merged tiles:
Author Chad Burton
Tags: image segmentation, RSGISlib, parallel, multiprocessing, object-orientated
# Location string of the geotiff you wish to segment inputTiff = "data/nmdb_Summer2017_18_NDVI_max.tif" # Location string of the .KEA file the geotiff will be converted too InputKEA = "data/nmdb_Summer2017_18_NDVI_max.kea" # Location string of clumps .KEA file that will be output ClumpsFile = "results/nmdb_Summer2017_18_NDVI_max_OutClumps.kea" # Location string of the Clump zonal mean of input tiff meanImageTiff = "results/nmdb_Summer2017_18_NDVI_max_ClumpMean.tif" # Location to a folder to store temporary files during segmentation temp = 'tmps/' # How many cpus will this run on? There are 8 CPUS on a VDI instance, # 5 cpus leaves enough processing power for other work while this script runs. ncpus=5 # what fraction of a tile should contain valid data? Below this threshold # a tile will be merged with its neighbour. validDataTileFraction = 0.3 # enter the tile size parameters (in number of pixels) width = 8000 height = 8000
Run the cells below to conduct the image segmentation¶
from osgeo import gdal import os import osr from rsgislib.segmentation import meanImage import rsgislib from pathos.multiprocessing import ProcessingPool as Pool import dill import xarray as xr import numpy as np #import custom functions import sys sys.path.append('../10_Scripts') import tiledSegParallel from SpatialTools import geotransform from SpatialTools import array_to_geotiff
# Change the tiff to a kea file (only run this once!) gdal.Translate(InputKEA, inputTiff, format='KEA', outputSRS='EPSG:3577')
#run the segmentation tiledSegParallel.performTiledSegmentation(InputKEA, ClumpsFile, tmpDIR=temp, numClusters=20, validDataThreshold=validDataTileFraction, tileWidth=width, tileHeight=height, minPxls=100, ncpus=ncpus)
# Attribute segments with zonal mean of input image and output as geotiff meanImage(inputTiff, ClumpsFile, meanImageTiff, "GTIFF",rsgislib.TYPE_32FLOAT)
#convert segments result into geotiff. a = xr.open_rasterio(ClumpsFile).squeeze() transform, projection = geotransform(a, (a.x, a.y), epsg=3577) width,height = a.shape array_to_geotiff(ClumpsFile[:-4]+".tif", a.values, geo_transform = transform, projection = projection, nodata_val=-999)
The picture below shows the results of an image segmentation, with the segments coloured by the zonal mean of the input geotiff (NDVI in this case).