Open and run analysis on multiple polygons
¶
**Sign up to the DEA Sandbox** to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
NCI
andDEA Sandbox
environmentsProducts used: ga_ls8c_ard_3
Background¶
Many users need to run analyses on their own areas of interest. A common use case involves running the same analysis across multiple polygons in a vector file (e.g. ESRI Shapefile or GeoJSON). This notebook will demonstrate how to use a vector file and the Open Data Cube to extract satellite data from Digital Earth Australia corresponding to individual polygon geometries.
Description¶
If we have a vector file containing multiple polygons, we can use the python package geopandas to open it as a GeoDataFrame
. We can then iterate through each geometry and extract satellite data corresponding with the extent of each geometry. Further anlaysis can then be conducted on each resulting xarray.Dataset
.
We can retrieve data for each polygon, perform an analysis like calculating NDVI and plot the data.
First we open the vector file as a
geopandas.GeoDataFrame
Iterate through each polygon in the
GeoDataFrame
, and extract satellite data from DEACalculate NDVI as an example analysis on one of the extracted satellite timeseries
Plot NDVI for the polygon extent
Getting started¶
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages¶
Please note the use of datacube.utils
package geometry
: this is important for saving the coordinate reference system of the incoming shapefile in a format that the Digital Earth Australia query can understand.
[1]:
%matplotlib inline
import datacube
import rasterio.crs
import geopandas as gpd
import matplotlib.pyplot as plt
from datacube.utils import geometry
import sys
sys.path.append('../Scripts')
from dea_datahandling import load_ard
from dea_bandindices import calculate_indices
from dea_plotting import rgb, map_shapefile
from dea_temporal import time_buffer
from dea_spatialtools import xr_rasterize
Connect to the datacube¶
Connect to the datacube database to enable loading Digital Earth Australia data.
[2]:
dc = datacube.Datacube(app='Analyse_multiple_polygons')
Analysis parameters¶
time_of_interest
: Enter a time, in units YYYY-MM-DD, around which to load satellite data e.g.'2019-01-01'
time_buff
: A buffer of a given duration (e.g. days) around the time_of_interest parameter, e.g.'30 days'
vector_file
: A path to a vector file (ESRI Shapefile or GeoJSON)attribute_col
: A column in the vector file used to label the outputxarray
datasets containing satellite images. Each row of this column should have a unique identifierproducts
: A list of product names to load from the datacube e.g.['ga_ls7e_ard_3', 'ga_ls8c_ard_3']
measurements
: A list of band names to load from the satellite product e.g.['nbart_red', 'nbart_green']
resolution
: The spatial resolution of the loaded satellite data e.g. for Landsat, this is(-30, 30)
output_crs
: The coordinate reference system/map projection to load data into, e.g.'EPSG:3577'
to load data in the Albers Equal Area projectionalign
: How to align the x, y coordinates respect to each pixel. Landsat Collection 3 should be centre alignedalign = (15, 15)
if data is loaded in its native UTM zone projection, e.g.'EPSG:32756'
[3]:
time_of_interest = '2019-02-01'
time_buff = '30 days'
vector_file = '../Supplementary_data/Analyse_multiple_polygons/multiple_polys.shp'
attribute_col = 'id'
products = ['ga_ls8c_ard_3']
measurements = ['nbart_red', 'nbart_green', 'nbart_blue', 'nbart_nir']
resolution = (-30, 30)
output_crs = 'EPSG:3577'
align = (0, 0)
Look at the structure of the vector file¶
Import the file and take a look at how the file is structured so we understand what we are iterating through. There are two polygons in the file:
[4]:
gdf = gpd.read_file(vector_file)
gdf.head()
[4]:
id | geometry | |
---|---|---|
0 | 2 | POLYGON ((980959.746 -3560845.144, 983880.024 ... |
1 | 1 | POLYGON ((974705.494 -3565359.492, 977625.771 ... |
We can then plot the geopandas.GeoDataFrame
using the function map_shapefile
to make sure it covers the area of interest we are concerned with:
[5]:
map_shapefile(gdf, attribute=attribute_col)
Create a datacube query object¶
We then create a dictionary that will contain the parameters that will be used to load data from the DEA data cube:
Note: We do not include the usual
x
andy
spatial query parameters here, as these will be taken directly from each of our vector polygon objects.
[6]:
query = {'time': time_buffer(time_of_interest, buffer=time_buff),
'measurements': measurements,
'resolution': resolution,
'output_crs': output_crs,
'align': align,
}
query
[6]:
{'time': ('2019-01-02', '2019-03-03'),
'measurements': ['nbart_red', 'nbart_green', 'nbart_blue', 'nbart_nir'],
'resolution': (-30, 30),
'output_crs': 'EPSG:3577',
'align': (0, 0)}
Loading satellite data¶
Here we will iterate through each row of the geopandas.GeoDataFrame
and load satellite data. The results will be appended to a dictionary object which we can later index to analyse each dataset.
[7]:
# Dictionary to save results
results = {}
# Loop through polygons in geodataframe and extract satellite data
for index, row in gdf.iterrows():
print(f'Feature: {index + 1}/{len(gdf)}')
# Extract the feature's geometry as a datacube geometry object
geom = geometry.Geometry(geom=row.geometry, crs=gdf.crs)
# Update the query to include our geopolygon
query.update({'geopolygon': geom})
# Load landsat
ds = load_ard(dc=dc,
products=products,
# min_gooddata=0.99, # only take uncloudy scenes
ls7_slc_off = False,
group_by='solar_day',
**query)
# Generate a polygon mask to keep only data within the polygon
mask = xr_rasterize(gdf.iloc[[index]], ds)
# Mask dataset to set pixels outside the polygon to `NaN`
ds = ds.where(mask)
# Append results to a dictionary using the attribute
# column as an key
results.update({str(row[attribute_col]): ds})
Feature: 1/2
Finding datasets
ga_ls8c_ard_3
Applying pixel quality/cloud mask
Loading 3 time steps
Rasterizing to match xarray.DataArray dimensions (92, 105) and projection system/CRS (e.g. PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]])
Feature: 2/2
Finding datasets
ga_ls8c_ard_3
Applying pixel quality/cloud mask
Loading 3 time steps
Rasterizing to match xarray.DataArray dimensions (92, 105) and projection system/CRS (e.g. PROJCS["GDA94 / Australian Albers",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","3577"]])
Further analysis¶
Our results
dictionary will contain xarray
objects labelled by the unique attribute column
values we specified in the Analysis parameters
section:
[8]:
results
[8]:
{'2': <xarray.Dataset>
Dimensions: (time: 3, x: 105, y: 92)
Coordinates:
* y (y) float64 -3.561e+06 -3.561e+06 ... -3.564e+06 -3.564e+06
* time (time) datetime64[ns] 2019-01-17T00:20:13.792499 ... 2019-02-18T00:20:07.953681
spatial_ref int32 3577
* x (x) float64 9.807e+05 9.808e+05 ... 9.838e+05 9.839e+05
Data variables:
nbart_red (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
nbart_green (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
nbart_blue (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
nbart_nir (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref,
'1': <xarray.Dataset>
Dimensions: (time: 3, x: 105, y: 92)
Coordinates:
* y (y) float64 -3.565e+06 -3.565e+06 ... -3.568e+06 -3.568e+06
* time (time) datetime64[ns] 2019-01-17T00:20:13.792499 ... 2019-02-18T00:20:07.953681
spatial_ref int32 3577
* x (x) float64 9.745e+05 9.745e+05 ... 9.776e+05 9.776e+05
Data variables:
nbart_red (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
nbart_green (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
nbart_blue (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
nbart_nir (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
crs: EPSG:3577
grid_mapping: spatial_ref}
Enter one of those values below to index our dictionary and conduct further analsyis on the satellite timeseries for that polygon.
[9]:
key = '1'
Plot an RGB image¶
We can now use the dea_plotting.rgb
function to plot our loaded data as a three-band RGB plot:
[10]:
rgb(results[key], col='time', size=4)

Calculate NDVI and plot¶
We can also apply analyses to data loaded for each of our polygons. For example, we can calculate the Normalised Difference Vegetation Index (NDVI) to identify areas of growing vegetation:
[11]:
# Calculate band index
ndvi = calculate_indices(results[key], index='NDVI', collection='ga_ls_3')
# Plot NDVI for each polygon for the time query
ndvi.NDVI.plot(col='time', cmap='YlGn', vmin=0, vmax=1, figsize=(18, 4))
plt.show()

Additional information¶
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Last modified: June 2020
Compatible datacube version:
[12]:
print(datacube.__version__)
1.8.0
Tags¶
Browse all available tags on the DEA User Guide’s Tags Index
Tags: NCI compatible, sandbox compatible, landsat 8, dea_plotting, dea_datahandling, xr_rasterize, dea_bandindices, dea_spatialtools, dea_temporal, time_buffer, load_ard, rgb, calculate_indices, NDVI, GeoPandas, shapefile