Compute 1 km pixel count summaries for Mangrove Cover

This notebook is designed for creating mangrove cover statistics for multiple cyclones or regions simultaneously.

What does this notebook do?

  1. Gives instructions for creating shapefiles for 1 km mangrove tiles

  2. Imports modules and sets up working folder

  3. Sets up functions for generating plots of yearly mangrove cover statistics

  4. Sets up list of shapefiles that contain 1 km mangrove tiles that 1) experienced certain wind speed categories for specific cyclones, or 2) for certain Natural Resource Management (NRM) regions

  5. Imports all Mangrove Cover datasets as combined virtual rasters for each year

  6. Uses rasterstats to compute pixel counts for each cover class within 1 km tiles

  7. Exports .shp files with pixel count attributes for each tile

  8. Exports pixel counts for all tiles and years as a .pkl.gz format file

  9. Generates mangrove cover statistics plots and .csv files for each cyclone or region

Requirements and data sources:

You need to run the following commands from the command line or a shell script prior to launching Jupyter notebooks from the same terminal so that the requiered libraries and paths are set:

module use /g/data/v10/public/modules/modulefiles

module load dea

The following files were copied to the raw_data directory of this notebook: * CreateZoneMangroveExtentPlots.py was copied from its original location at /g/data/r78/pjb552/MangChangePVFC_V3 * 1kmGridMangrovesfinal.shp was copied from its original location at /g/data/r78/al3504/temp/1kmGridMangrovesfinal.shp * AustraliaSqGrid_MangroveRegionsV1_ExtraV3Info.shp was copied from its original location at /g/data/r78/pjb552/MangChangePVFC_V3/AustraliaSqGrid_MangroveRegionsV1_ExtraV3Info.shp.

Author: Sebastian Wong, Claire Krause, Robbi Bishop Taylor

Date: January 2019

Start by identifying which tiles you want to analyse

Below are instructions for creating the shapefiles required for input into this notebook

Using QGIS, import the 1kmGridMangrovesfinal.shp file. Then select the tiles that are of interest from that shapefile.

Identify tiles in certain wind speed categories

(Note: create and save a duplicate of the 1kmGridMangrovesfinal.shp file before performing this step, so that the original is not altered).

Identifying mangroves within a wind speed raster can be done by using the Zonal Statistics plugin. Once the Zonal Statistics plugin has been installed, select it via Raster>Zonal statistics>Zonal statistics. In the ‘Zonal Statistics’ dialogue box select the wind speed raster for a particular cyclone under the ‘Raster layer:’ drop down menu. Under the ‘Polygon layer containing the zones:’ drop down menu, select the new 1kmGridMangrovesfinal.shp duplicate file that was saved earlier. From the ‘Statistics to calculate:’ menu select ‘Standard deviation’ and ‘Maximum’ and any other desired statistics. Click OK.

The shapefile created above will now have zonal statistics for all mangrove tiles that are contained within the wind speed raster. One way of creating shapefiles for each wind speed category is to firstly duplicate the shapefile created in the previous step. Then, open the attributes table and select the data that falls within a wind speed range for a desired category. This can be done by checking the new Maximum column, which was created during the Zonal Statistics step. The remaining data can be deleted. This will leave you with a shapefile with the 1km mangrove tiles that have experienced the chosen wind speed category. This step can be repeated to create shapefiles for each area that has been subjceted to certain wind speed categories for a given cyclone. Rename new shapefiles according to their wind speed categories.

Below is a summary of the wind speeds and their corresponding categories given by the Bureau of Meteorology (2019) (more details can also be found at http://www.bom.gov.au/cyclone/about/):

Category

Strongest gusts (km/h)

Typical effects

1 Tropical Cyclone

Less than 125 km/h Gales

Minimal house damage. Damage to some crops, trees and caravans. Boats may drag moorings

2 Tropical Cyclone

125 - 164 km/h Destructive winds

Minor house damage. Significant damage to signs, trees and caravans. Heavy damage to some crops. Risk of power failure. Small boats may break moorings

3 Severe tropical Cyclone

165 - 224 km/h Very destructive winds

Some roof and structural damage. Some caravans destroyed. Power failure likely.

4 Severe Tropical Cyclone

225 - 279 km/h Very destructive winds

Significant roofing and structural damage. Many caravans destroyed and blown away. Dangerous airborne debris. Widespread power failures.

5 Severe Tropical Cyclone

More than 280 km/h Extremely destructive winds

Extremely dangerous with widespread destruction

You have now created shapefiles for 1km mangrove tiles that have been subjected to certain wind speed categories during specific cyclones. These shapefiles can now be used to extract mangrove cover statistics.

Import modules

This cell imports the modules required to run the scripts below. The working folder is also identified in this cell.

The working folder directory given in the cell below is in the Supplementary_data folder of the dea-notebooks folder. Shapefiles for 1 km mangrove tiles for certain wind speed categories for specific cyclones, and Natural Resource Management (NRM) regions, have already been created (as per the instructions above) and are located in suitably named folders within the MangroveCoverFiles folder (sw4446/dea-notebooks/Supplementary_data/MangroveCoverFiles). All outputs in this notebook are directed to out_data folders within the folder for the specific cyclone or NRM region. Therefore, the directory in the scripts (that output data to the out_data folders) should be changed to a directroy of your choice (as to create a local copy of the files that does not clutter the Supplementary_data folder). Likewise, the 1 km mangrove tile shapefiles in the folders for specific cyclones or NRM regions could also be copied out to a location of your choice, and the directories in the scripts below changed.

[ ]:
# Import modules
%pylab notebook

import xarray as xr
import os
import pandas as pd
import geopandas as gpd
from rasterstats import zonal_stats
import shutil
import re
import rsgislib
import rsgislib.vectorutils
import osgeo.gdal as gdal
import numpy

os.chdir('/g/data/r78/sw4446/dea-notebooks/Supplementary_data/MangroveCoverFiles')

Set up some functions for plotting

This cell sets up some functions for creating the Mangrove Cover Statistics plots. The code in this cell was copied from CreateZoneMangroveExtentPlots.py, which was origionally located at /g/data/r78/pjb552/MangChangePVFC_V3. The code in CreateZoneMangroveExtentPlots.pythat sets up the plots was copied and pasted into the cell below, with only minor modifications to suit this notebook.

[ ]:
def createPlots(dataFile, tilesOfInterest, outTypePlotFile, baseTitle):

    data = pd.read_pickle(dataFile, compression="gzip")

    print(data)

    outData = data[tilesOfInterest[0]][['total','low','mid','high']]
    if len(tilesOfInterest) > 1:
        for tile in tilesOfInterest[1:]:
            outData = outData + data[tile][['total','low','mid','high']]

    outData = (outData) * 625 / 1000000

    # Setting up plot data, axes, colour

    ax=outData[['low','mid','high']].plot.bar(stacked=True, figsize=(10,7), color=['#9FFF4C', '#5ECC00', '#3B7F00'], legend=False, width=1.0)
    ax.set_xlabel('Time (Years)')
    ax.set_ylabel('Area (km$^{2}$)')
    ax.set_title(baseTitle)
    plt.savefig(outTypePlotFile)


def GenZoneMangroveChangePlots(zonesSHP, zoneNameCol, plotPathDIR, dataFile, gridSHP, gridIDCol, tmpPath, RegionName):
    rsgisUtils = rsgislib.RSGISPyUtils()
    uidStr = rsgisUtils.uidGenerator()
    tmpDIR = os.path.join(tmpPath, uidStr)

    tmpPresent = True
    if not os.path.exists(tmpDIR):
        os.makedirs(tmpDIR)
        tmpPresent = False

    ############ DO STUFF ###############

    zonesDataSet = gdal.OpenEx(zonesSHP, gdal.OF_VECTOR )
    if zonesDataSet is None:
        raise("Failed to open input zones shapefile\n")
    zonesLyr = zonesDataSet.GetLayer()
    zonesSpatRef = zonesLyr.GetSpatialRef()
    zonesSpatRef.AutoIdentifyEPSG()
    epsgCodeZones = zonesSpatRef.GetAuthorityCode(None)

    gridDataSet = gdal.OpenEx(gridSHP, gdal.OF_VECTOR )
    if gridDataSet is None:
        raise("Failed to open input grid shapefile\n")
    gridLyr = gridDataSet.GetLayer()
    gridSpatRef = gridLyr.GetSpatialRef()
    gridSpatRef.AutoIdentifyEPSG()
    epsgCodeGrid = gridSpatRef.GetAuthorityCode(None)

    if epsgCodeZones is not epsgCodeGrid:
        print('Reprojecting zones shapefile')
        zonesDataSet = None
        shpBasename = os.path.splitext(os.path.basename(zonesSHP))[0]
        tmpReprojSHP = os.path.join(tmpDIR, shpBasename+'_reproj.shp')
        rsgislib.vectorutils.reProjVectorLayer(zonesSHP, tmpReprojSHP, gridSpatRef.ExportToWkt())

        origZonesSHP = zonesSHP
        zonesSHP = tmpReprojSHP

        zonesDataSet = gdal.OpenEx(zonesSHP, gdal.OF_VECTOR )
        if zonesDataSet is None:
            raise("Failed to open input zones shapefile\n")
        zonesLyr = zonesDataSet.GetLayer()
        zonesSpatRef = zonesLyr.GetSpatialRef()
        zonesSpatRef.AutoIdentifyEPSG()
        epsgCodeZones = zonesSpatRef.GetAuthorityCode(None)

    zonesFeat = zonesLyr.GetNextFeature()
    while zonesFeat:
        zoneGeom = zonesFeat.GetGeometryRef()
        if zoneGeom is not None:
            gridLyr.SetSpatialFilter(zoneGeom)
            gridIDs = []
            for gridFeat in gridLyr:
                gridIDs.append(int(gridFeat.GetField(gridIDCol)))

            zoneName = str(zonesFeat.GetField(zoneNameCol))
            zoneNameNoSpace = zoneName.replace(' ', '_')
            outTypePlotFile = os.path.join(plotPathDIR, zoneNameNoSpace+'_totalAreaType_' + RegionName + '.png')
            createPlots(dataFile, gridIDs, outTypePlotFile, RegionName)

        zonesFeat = zonesLyr.GetNextFeature()

    zonesDataSet = None
    gridDataSet = None

    ####################################

    if not tmpPresent:
        shutil.rmtree(tmpDIR, ignore_errors=True)

List of cyclones to be processed

Below are lists of shapefile names which were created for specific wind speed categories for specific cyclones, and for NRM regions. The purpose of these lists are for easy repeatability and modifications of the scripts. Any wind speed category for any specific cyclone, or a group of these, can be run simultaneously, by simply copying and pasting the lists, or parts of the lists, into the StormNames cell below. Similarly, some or all of the NRM regions can also be run simultaneously by copying and pasting the lists, or parts of the lists, into the StormNames cell below.

Please note however, that the wind speed categories for specific cyclones and the NRM regions can not be run at the same time as they use different naming conventions, which prevenets the script to run if trying to run both. Also, when changing to or from running cyclones and NRM regions, the script in the ‘Extract raster statistics for each tile polygon / 1 km tiles’ cell below must be modified, with the required code being active and the unnecessary code being commented out.

Cyclone list for GBRMPA

This cyclone list is of wind speed categories for specific cyclones that recently affected the Great Barrier Reef Marine Park Authority areas.

‘Debbie_C2’, ‘Debbie_C3’, ‘Debbie_C4’, ‘Debbie_C5’, ‘Debbie_C4C5’, ‘Ita_C1’, ‘Ita_C2’, ‘Ita_C3’, ‘Ita_C4’, ‘Marcia_C1’, ‘Marcia_C2’, ‘Marcia_C3’, ‘Marcia_C4’, ‘Marcia_C5’, ‘Marcia_C4C5’, ‘NathanLF1_C1’, ‘NathanLF1_C2’, ‘NathanLF1_C3’ ‘NathanLF2_C1’, ‘NathanLF2_C2’

NRM list for GBRMPA

This NRM region list is of all the 1 km mangrove tiles that fall within certain coastal NRM regions.

‘Burdekin_NRM_Tiles’, ‘Burnett_Mary_NRM_Tiles’, ‘Cape_York_NRM_Tiles’, ‘Fitzroy_NRM_Tiles’, ‘Mackay_Whitsunday_NRM_Tiles’, ‘Wet_Tropics_NRM_Tiles’

Other cyclone list

This cyclone list is of wind speed categories for specific cyclones that were significantly damaging between 2005 and 2015.

‘George_C1’, ‘George_C2’, ‘George_C3’, ‘George_C4’, ‘George_C5’, ‘George_C4C5’, ‘IngridLF1_C1’, ‘IngridLF1_C2’, ‘IngridLF1_C3’, ‘IngridLF2_C1’, ‘IngridLF2_C2’, ‘IngridLF2_C3’, ‘IngridLF2_C4’, ‘IngridLF2_C5’, ‘IngridLF2_C4C5’, ‘IngridLF3_C1’, ‘IngridLF3_C2’, ‘IngridLF3_C3’, ‘IngridLF3_C4’, ‘Lam_C1’, ‘Lam_C2’, ‘Lam_C3’, ‘Lam_C4’, ‘Larry_C1’, ‘Larry_C2’, ‘Larry_C3’, ‘Larry_C4’, ‘Larry_C5’, ‘Larry_C4C5’, ‘Laurence_C1’, ‘Laurence_C2’, ‘Laurence_C3’, ‘Laurence_C4’, ‘Laurence_C5’, ‘Laurence_C4C5’, ‘MonicaLF1_C1’, ‘MonicaLF1_C2’, ‘MonicaLF1_C3’, ‘MonicaLF2_C1’, ‘MonicaLF2_C2’, ‘MonicaLF2_C3’, ‘MonicaLF2_C4’, ‘MonicaLF2_C5’, ‘MonicaLF2_C4C5’, ‘Yasi_C2’, ‘Yasi_C3’, ‘Yasi_C4’, ‘Yasi_C5’, ‘Yasi_C4C5’

Enter the wind speed categories for specific cyclones or NRM regions (either from the list above or from shapefiles created/saved elsewhere) in the cell below:
[ ]:
StormNames = [
              'Debbie_C5',
             ]

Extract raster statistics for each tile polygon

Summary of the following cell:

This cell extracts zonal statistics for every 1 km tile using rasterstats.zonal_stats, this is done for each year. This produces a geojson object which can be converted to a dataframe of counts for each class. After some data wrangling to match the format of the original files, a .pkl.gz format file is saved, which is used as an input to create plots using script from CreateZoneMangroveExtentPlots.py. The data is then plotted and a csv file is created.

Detail of steps in the following cell:

Note: There is a large quantity of code in the following cell, as this cell is looped. Below is a description of what each peice of code in the cell does.

Set the location of folders and files for use in the following code

The begining of this cell allows you to set the names of folders (TCName), and the shapefiles within them (TCSimulation) for use in the following code. This is to help with efficiently changing the cyclones or regions to be run without changing the names of folders and shapefiles in each line of code.
Note: If the folders or shapefiles are not in the directory given in the following cell, the directory will have to be updated for these features to work.

Reproject the 1 km mangrove tiles to Australian Albers projection

The next step in this cell is to reproject the 1 km mangrove tiles to Australian Albers projection. This code uses the 1 km mangrove tiles affected by a particular wind speed category during a specific cyclone, or an NRM region, and reprojects them. This uses the shapefiles in the cyclone or NRM region folder (TCName), and outputs the reprojected shapfiles to the out_data folder for that cyclone or region. It is recommended that the directory for the output data be changed from the out_data folders to a more suitable folder of your choice (as to create a copy of the files in a more suitable location that does not clutter the Supplementary_data folder). If the directory for the output data is changed, then the directories for out_data throughout the following cell must be changed accordingly, in order for this script to run.

Calculate zonal statistics

The next step in this cell calculates the zonal statistics from the MANGROVE_COVER_3577_{year}.vrt virtual raster based on the area identified by the 1 km mangrove tiles shapefile used.

Add any missing data

If any columns are missing or have a null value, this code adds an appropriately named column with ‘0’ values so that the following code can run.

Rearrange and reformat data

The script then exports a shapefile of 1 km pixel counts, rearranges the data to match the required file format, and reformats the values so that they are in square km (not pixel counts). The data is then combined into a single pandas dataframe, then written to a pickle format.

Output data

The final step in this cell and notebook is to output data in the form of a mangrove cover plot, and a csv version of these statistics.

This cell will run/loop continuously through the list defined in StormNames until the final data is output, upon which it will print ‘We’re all done now’. The outputs can be found in the out_data folders for the specific cyclones or NRM regions. In the out_data folder for each cyclone or NRM region, you will find: 1. .shp files of the 1 km tiles reprojected to Albers Australian projection 2. .shp files for 1 km pixel counts for each year 3. A mangrove change .pkl.gz file for the 1 km tiles from 1987 to 2017 4. A mangrove change .csv file for the 1 km tiles from 1987 to 2017 5. A .png file plot of the mangrove cover change from 1987 to 2017

[ ]:
for TCSimulation in StormNames:
    print(f'Working on {TCSimulation}')

    # TCName is name of folder conatining files for a certain cyclone or region

    # Code for running cyclones

    TCName, throwout = TCSimulation.split('_')
    TCName = re.sub('LF\d', '', TCName)

    # Code for running NRM Regions

    # TCName = re.sub('_NRM_Tiles', '', TCSimulation)


    ## Reproject 1km tiles to Australian Albers projection
    #Reproject tiles to Australian Albers projection. This cell imports the shapefile with the tiles needed for
    #analysis (which is identified in `{TCSimulation}`), reprojects them in the Australian Albers projection, then
    #outputs them into the out_data folder in the cyclone or region folder (which is identified in `{TCName}`).
    # Import 1km tiles, reproject to Albers then save out to file
    tiles_1km = gpd.read_file(f'./{TCName}/{TCSimulation}.shp')
    tiles_1km_albers = tiles_1km.to_crs({'init': 'epsg:3577'})
    tiles_1km_albers.to_file(driver = 'ESRI Shapefile', filename = f'./{TCName}/out_data/{TCSimulation}_albers.shp')

    out_list = []
    CategoryCountAll = xr.Dataset()

    print('Calculating stats for each year')

    for year in range(1987, 2018):

        # Calculate zonal states for every 1km tile in the virtual raster and export geojson with counts for each class
        out = zonal_stats(f'./{TCName}/out_data/{TCSimulation}_albers.shp', f'raw_data/MANGROVE_COVER_3577_{year}.vrt',
                          categorical=True, category_map={1: 'low', 2: 'mid', 3: 'high'},
                          geojson_out=True)

        # Convert geojson to Geopandas dataframe and set NaN values to zero
        tile_gpd = gpd.GeoDataFrame.from_features(out).fillna(0)

        # If columns are missing data, this script creates columns (e.g. 'total', 'low', 'mid', 'high')
        #so that the following parts of the script can run.
        try:
            tile_gpd['total'] = tile_gpd[['low', 'mid', 'high']].sum(axis=1)
        except KeyError:
            if 'high' not in tile_gpd:
                print('Adding high')
                tile_gpd['high'] = 0
            else:
                continue
        try:
            tile_gpd['total'] = tile_gpd[['low', 'mid', 'high']].sum(axis=1)
        except KeyError:
            if 'mid' not in tile_gpd:
                print('Adding mid')
                tile_gpd['mid'] = 0
            else:
                continue
        try:
            tile_gpd['total'] = tile_gpd[['low', 'mid', 'high']].sum(axis=1)
        except KeyError:
            if 'low' not in tile_gpd:
                print('Adding low')
                tile_gpd['low'] = 0
            else:
                continue
        try:
            tile_gpd['total'] = tile_gpd[['low', 'mid', 'high']].sum(axis=1)
        except KeyError:
            if 'total' not in tile_gpd:
                print('Adding total')
                tile_gpd['total'] = 0
            else:
                continue
#         try:
#             tile_gpd['total'] = tile_gpd['ID', 'total', 'low', 'mid', 'high'].sum(axis=1)
#         except KeyError:
#             if 'ID' not in tile_gpd:
#                 print('Adding ID')
#                 tile_gpd['ID'] = 0
#             else:
#                 continue


        # Export to shapefile
        tile_gpd.crs = {'init': 'epsg:3577'}
        tile_gpd.to_file(driver = 'ESRI Shapefile',
                         filename = f'./{TCName}/out_data/1km_pxlcounts_{TCSimulation}_{year}.shp')

        # Re-rearrange data to match required file format
        tile_reshape_gpd = tile_gpd[['ID', 'total', 'low', 'mid', 'high']].melt(id_vars=['ID'],
                                                                                var_name='PxlCount',
                                                                                value_name=str(year)).sort_values('ID')
        tile_reshape_gpd_noindex = tile_reshape_gpd.rename(columns={'ID': 'GridID'})
        tile_reshape_gpd = tile_reshape_gpd_noindex.set_index(['GridID', 'PxlCount']).reindex(['total', 'low', 'mid', 'high'],
                                                                                      level='PxlCount').T
        out_list.append(tile_reshape_gpd)

        CountCategories = tile_reshape_gpd_noindex.groupby(tile_reshape_gpd_noindex['PxlCount']).sum()

        # Reformat the values so that they are in square km, not pixel counts
        TempCategoryCountAll = xr.Dataset()
        TempCategoryCountAll['Year'] = CountCategories[f'{year}'].name
        TempCategoryCountAll['Low (km2)'] = (CountCategories[f'{year}'].low) * 625 / 1000000
        TempCategoryCountAll['Mid (km2)'] = (CountCategories[f'{year}'].mid) * 625 / 1000000
        TempCategoryCountAll['High (km2)'] = (CountCategories[f'{year}'].high) * 625 / 1000000

        try:
            CategoryCountAll = xr.concat([CategoryCountAll, TempCategoryCountAll], dim = 'concat_dims')
        except ValueError:
            CategoryCountAll = TempCategoryCountAll


    # Combine into a single pandas dataframe, then write to pickle
    tile_allyears_gpd = pd.concat(out_list)
    tile_allyears_gpd.to_pickle(f'./{TCName}/out_data/MangChange_{TCSimulation}_1km_1987_to_2017.pkl.gz',
                                compression="gzip")

    # output a csv of stats
    CategoryCountAll.to_dataframe().to_csv(f'./{TCName}/out_data/MangChange_{TCSimulation}_1km_1987_to_2017.csv',
                                           index = False)

    # plot the results
    print('Plotting up the results')
    GenZoneMangroveChangePlots('raw_data/Mangrove_study_region.shp', 'REG_CODE_7', f'{TCName}/out_data/',
                               f'./{TCName}/out_data/MangChange_{TCSimulation}_1km_1987_to_2017.pkl.gz',
                               f'./{TCName}/out_data/{TCSimulation}_albers.shp',
                               'ID', f'{TCName}/out_data/', f'{TCSimulation}')
    plt.close()

print('We\'re all done now')
[ ]: