Remove rivers from waterbody polygons 7ebc79c4bd594efc8d2ef942324f05b6

  • Compatibility: Notebook currently compatible with the NCI environment only. You can make this notebook Sandbox compatible by pointing it to the DEA Waterbodies timeseries located in AWS.

  • Products used: None.

  • Special requirements

    • River line dataset for filtering out polygons comprised of river segments.

      • Variable name: MajorRiversDataset

      • Here we use the Bureau of Meteorology’s Geofabric v 3.0.5 Beta (Suface Hydrology Network), filtered to only keep features tagged as major rivers.

      • There are some identified issues with this data layer that make the filtering using this data inconsistent (see the discussion below)

      • We therefore turn this off during the production of the water bodies shapefile.

  • Prerequisites: This notebook explores the individual waterbody timeseries csvs contained within the DEA Waterbodies dataset. It has been designed with that very specific purpose in mind, and is not intended as a general analysis notebook.


This notebook applies the FilterRivers filter from the `TurnWaterObservationsIntoWaterbodyPolygons.ipynb <../TurnWaterObservationsIntoWaterbodyPolygons.ipynb>`__ notebook. This allows this filtering step to be applied to the final Waterbody polygon dataset, in order to produce a further refined version in which the polygon UIDs still match up with the all encompassing version.

  1. Load in modules and set up some functions

  2. Load in the river lines dataset

  3. Load in the waterbodies dataset

  4. Find where the two intersect and remove waterbodies that intersect with river lines

  5. Write out the results to a new shapefile

Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Note that file paths have been hardcoded below. To run this notebook, make sure that these are still correct.

Load packages and functions

Import Python packages that are used for the analysis.

import geopandas as gp
def Filter_shapefile_by_intersection(gpdData,
    Filter out polygons that intersect with another polygon shapefile.


    gpdData: geopandas dataframe
        Polygon data that you wish to filter
    gpdFilter: geopandas dataframe
        Dataset you are using as a filter

    filtertype: default = 'intersects'
        Options = ['intersects', 'contains', 'within']
    invertMask: boolean
        Default = 'True'. This determines whether you want areas that DO ( = 'False') or DON'T ( = 'True')
        intersect with the filter shapefile.
    returnInnverse: boolean
        Default = 'False'. If true, then return both parts of the intersection - those that intersect AND
        those that don't as two dataframes.

    gpdDataFiltered: geopandas dataframe
        Filtered polygon set, with polygons that intersect with gpdFilter removed.
    IntersectIndex: list of indices of gpdData that intersect with gpdFilter

    if 'returnInverse = True'
    gpdDataFiltered, gpdDataInverse: two geopandas dataframes
        Filtered polygon set, with polygons that DON'T intersect with gpdFilter removed.

    # Check that the coordinate reference systems of both dataframes are the same

    #assert ==, 'Make sure the the coordinate reference systems of the two provided dataframes are the same'

    Intersections = gp.sjoin(gpdFilter, gpdData, how="inner", op=filtertype)

    # Find the index of all the polygons that intersect with the filter
    IntersectIndex = sorted(set(Intersections['index_right']))

    # Grab only the polygons NOT in the IntersectIndex
    # i.e. that don't intersect with a river
    if invertMask:
        gpdDataFiltered = gpdData.loc[~gpdData.index.isin(IntersectIndex)]
        gpdDataFiltered = gpdData.loc[gpdData.index.isin(IntersectIndex)]

    if returnInverse:
        # We need to use the indices from IntersectIndex to find the inverse dataset, so we
        # will just swap the '~'.

        if invertMask:
            gpdDataInverse = gpdData.loc[gpdData.index.isin(IntersectIndex)]
            gpdDataInverse = gpdData.loc[~gpdData.index.isin(IntersectIndex)]

        return gpdDataFiltered, IntersectIndex, gpdDataInverse

        return gpdDataFiltered, IntersectIndex

Load in the datasets

We use the Bureau of Meteorology’s Geofabric v 3.0.5 Beta (Suface Hydrology Network) to filter out polygons that intersect with major rivers. This is done to remove river segments from the polygon dataset. We use the SH_Network AHGFNetworkStream any layer within the geodatabase, and filter the dataset to only keep rivers tagged as major. It is this filtered dataset we use here.

Note that we reproject this dataset to epsg 3577, Australian Albers coordinate reference system. If this is not correct for your analysis, you can change this in the cell below.

Note when using the Geofabric to filter out rivers

The option to filter out rivers was switched off for the production of our water bodies dataset. During testing, the Geofabric dataset was shown to lead to inconsistencies in what was removed, and what remained within the dataset.

  • The Geofabric continues the streamline through on-river dams, which means these polygons are filtered out. This may not be the desired result.

Stream and Dam intersection

Read in the river lines dataset

# Where is this file located?
MajorRiversDataset = '/g/data/r78/cek156/ShapeFiles/SH_Network_GDB_National_V3_0_5_Beta/SH_Network_GDB_National_V3_0_5_Beta_MajorFiltered.shp'

# Read in the major rivers dataset (if you are using it)
MajorRivers = gp.GeoDataFrame.from_file(MajorRiversDataset)
MajorRivers = MajorRivers.to_crs({'init':'epsg:3577'})

Read in the waterbodies dataset

WaterPolygons = gp.read_file('/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/AusAllTime01-005HybridWaterbodies/AusWaterBodiesFINAL.shp')
WaterPolygons = WaterPolygons.to_crs({'init':'epsg:3577'})

Filter out polygons that intersect with a major river

WaterBodiesBigRiverFiltered, Index = Filter_shapefile_by_intersection(WaterPolygons,

Write out the amended shapefile

[14]: = {'init': 'epsg:3577'}
    driver='ESRI Shapefile')

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: January 2020

Compatible datacube version: N/A


Browse all available tags on the DEA User Guide’s Tags Index

Tags: DEA Waterbodies, no_testing