Identify the time a waterbody exceeds X% wet surface area

  • Compatability: Notebook currently compatible with the NCI environment only. You can make this notebook Sandbox compatible by pointing it to the DEA Waterbodies timeseries located in AWS.

  • Products used: None.

  • Prerequisites: This notebook explores the individual waterbody timeseries csvs contained within the DEA Waterbodies dataset. It has been designed with that very specific purpose in mind, and is not intended as a general analysis notebook. This notebook in its current form assumes that the `RemoveRiversfromWaterBodyPolygons.ipynb <RemoveRiversfromWaterBodyPolygons.ipynb>`__ notebook has already been run. You can choose to run this notebook on the non-river filtered polygons, but rivers will produce erroneous results when performing the wet area detection as they are long-standing features in the landscape and a ‘first observed’ analysis will inevitably be limited by observation availability and not reflect the actual feature.

Description

This notebook loops through all of the individual DEA Waterbodies timeseries files and finds the quarter (JFM/AMJ/JAS/OND) that each waterbody is first/last observed at having at least SurfaceAreaThreshold% of the total surface area of the waterbody observed as wet. This attribute is used as a proxy for construction date for built waterbody structures. This date is then appended to the waterbody polygon shapefile as an attribute.

  1. Load in the required modules

  2. Set up the file paths for inputs/outputs

  3. Read in the DEA Waterbodies shapefile

  4. Loop through each timeseries and find the quarter where the waterbody is first/last observed at having at least X% of the total surface area of the waterbody observed as wet

  5. Append this date to the shapefile

  6. Write out an updated shapefile

  7. Load in the shapefile again (done as a separate step so you can choose to start to run the notebook from here without re-doing the analysis)

  8. Groupby the dates and sum the area of all the polygons first/last observed in that quarter

  9. Plot the results


Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

Import Python packages that are used for the analysis.

[2]:
%matplotlib inline
import matplotlib.pyplot as plt

import fiona
import geopandas as gp
import numpy as np

import pandas as pd
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

Analysis parameters

  • WaterBodyRiverFiltered: file path to the DEA Waterbodies shapefile to read in

  • CSVFolder: file path to the folder of DEA Waterbodies timeseries

  • SurfaceAreaThreshold: e.g. 50. Select the percentage of the total surface area of the waterbody that must have been observed as wet as a single time period.

  • FirstorLastIndex: e.g. {‘First’: 0} for first, or {‘Last’: -1} for last. Set which index to pull out depending on whether you want to find the first time the surface area wetness threshold was met, or the last time it was.

[3]:
WaterBodyRiverFiltered = '/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/AusAllTime01-005HybridWaterbodies/AusWaterBodiesFINALRiverFiltered.shp'
CSVFolder = '/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/'
SurfaceAreaThreshold = 50
FirstorLastIndex = {'First': 0}

Read in data and loop through each timeseries

[3]:
WaterBodyRiverFilteredShapes = gp.read_file(WaterBodyRiverFiltered)
# Set up an empty attribute to write into
WaterBodyRiverFilteredShapes[f'Time {list(FirstorLastIndex.keys())[0]} Exceeds {SurfaceAreaThreshold}%'] = -999
[4]:
PolygonsThatDidntWork = []

for shapes in fiona.open(WaterBodyRiverFiltered):
    polyName = shapes['properties']['UID']

    # Read in the correct csv file
    FileToLoad = f'{CSVFolder}{polyName[:4]}/{polyName}.csv'
    try:
        Timeseries = pd.read_csv(FileToLoad)
    except:
        print(f'Can\'t open{FileToLoad}')
        continue

    # Change the date column to an actual datetime object
    Timeseries['Observation Date'] = pd.to_datetime(
        Timeseries['Observation Date'])
    Timeseries = Timeseries.set_index(['Observation Date'])

    # Aggregate the timeseries quarterly
    QuartelyAveraged = Timeseries.resample('Q', label='left').agg('max')

    try:
        # Find the first time that the timeseries is greater than or equal to 50%
        IndexExceeds = np.where(
            QuartelyAveraged['Wet pixel percentage'] >=
            SurfaceAreaThreshold)[0][list(FirstorLastIndex.values())[0]]
        TimeExceeds = QuartelyAveraged.iloc[IndexExceeds]
        DateString, BitToThrowOut = str(TimeExceeds.name).split(' ')

    except:
        print('Another one that didn\'t work...')
        PolygonsThatDidntWork.append(polyName)
        continue
    # Put the values back into the dataframe
    WaterBodyRiverFilteredShapes.at[WaterBodyRiverFilteredShapes.loc[
        WaterBodyRiverFilteredShapes['UID'] ==
        polyName].index, f'{list(FirstorLastIndex.keys())[0]}{SurfaceAreaThreshold}%'] = DateString

WaterBodyRiverFilteredShapes.to_file(
    f'/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/AusAllTime01-005HybridWaterbodies/AusWaterBodiesFINALRiverFilteredTimeseries{list(FirstorLastIndex.keys())[0]}{SurfaceAreaThreshold}.shp'
)
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/qv6p/qv6p0jr15.csv
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/qvut/qvut32ubz.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/qvzf/qvzfjh7uf.csv
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1cv/r1cvw9450.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1k2/r1k2rccu9.csv
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1qz/r1qz238cv.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1r1/r1r1hjuyv.csv
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1wk/r1wk02791.csv
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r30n/r30nf45jq.csv
Another one that didn't work...
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r38p/r38pwsvwh.csv
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r4z9/r4z9h4k55.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r5u3/r5u3juecs.csv
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r6up/r6upjhtyq.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r74d/r74dznf97.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r764/r7648hec6.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r7cd/r7cd57z1h.csv
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r7g5/r7g5z931y.csv
Another one that didn't work...
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhg9/rhg9rbvhc.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhrz/rhrzd1f33.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhut/rhute581e.csv
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhvj/rhvj0znm9.csv
Another one that didn't work...
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rj2h/rj2httbdp.csv
Another one that didn't work...
Can't open/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rk22/rk22bz0n6.csv

Calculate some statistics on the timing of waterbodies wet observations

Load back in the new appended shapefile and add up the total surface area of waterbodies that were first/last seen as at least X% wet in each quarter through time.

Note that this notebook can be run starting from this point, ignoring the processing cell above if it has previously been run.

[9]:
AllTheData = gp.read_file(f'/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/AusAllTime01-005HybridWaterbodies/AusWaterBodiesFINALRiverFilteredTimeseries{list(FirstorLastIndex.keys())[0]}{SurfaceAreaThreshold}.shp')

Group the data by quarter, remove missing values and convert to \(km^2\)

[10]:
# Group all the waterbodies by quarters and sum them up
GroupedData = AllTheData.groupby(by=f'{list(FirstorLastIndex.keys())[0]}{SurfaceAreaThreshold}%').sum()
# Drop the -999 missing values
CleanedData = GroupedData.drop('-999')
[11]:
# Convert the quarter dates to a pandas datetime object
CleanedData.index = pd.to_datetime(CleanedData.index)
# Convert from m2 to km2
CleanedData['area'] = CleanedData['area'] / 1000000

Plot the results

[20]:
plt.figure(figsize=[10, 8])
plt.plot(CleanedData.area.cumsum())
plt.xlim('1992-01-01', '2018-01-01')
plt.ylim(40000, 55000)
plt.ylabel('Total area of water bodies ($km^2$)')
plt.xlabel(f'Time each waterbody {list(FirstorLastIndex.keys())[0]} observed at having at least {SurfaceAreaThreshold}% \n'
           'of the total surface area of the waterbody observed as wet')
plt.title(f'Total area of waterbodies {list(FirstorLastIndex.keys())[0]} observed as {SurfaceAreaThreshold}% wet each quarter')
plt.rcParams["font.size"] = "14"
../../../../_images/notebooks_Scientific_workflows_DEAWaterbodies_DEAWaterbodiesToolkit_IdentifyTimeWaterbodyExceedsXpc_16_0.png

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: January 2020

Compatible datacube version: N/A

Tags

Browse all available tags on the DEA User Guide’s Tags Index

Tags: DEA Waterbodies, no_testing