Generate metrics with WIT data

  • Compatibility: Notebook currently compatible with the NCI VDI or DEA sandbox

  • Special requirements:

    • If running on the NCI, ensure that module load dea is run prior to launching this notebook

    • If on NCI and NCI only, in need of loading data from the database, check you have the latest version of the wit_tooling package by copying and pasting the following code into a cell below and running the cell. You will need to restart the notebook kernel after running this cell.

      !pip install --user git+git://github.com/GeoscienceAustralia/wit_tooling

  • Products used:

  • Dependencies: This code requires two things to run (see the analysis parameters section for more information):

    • A pre-calculated WIT csv

    • A shapefile (or equivalent) that contains the area that the WIT result was run over.

    • A csv containing IDs of polygons from the shapefile

Background

The WIT data are generated by DEA with given wetland polygons and stored in a database on NCI. The data can be dumped into a csv when required. Any statistics can be generated with WIT data. This notebook provides a way in computing temporal statistics (metrics). ## Description This notebook uses existing WIT data to compute metrics. * First we load the existing WIT csv data from a saved csv location * Then we compute the metrics for all polygons and output the results to csvs

Data definition

  • WIT data

TIME: time of obersavation
BS: percentage of bare soil
NPV: percentage of non photosynthetic vegetation
PV: percentage of green/photosynthetic vegetation
WET: percentage of wetness
WATER: percentage of water

Before running this notebook

  • Follow the instructions under Special Requirements above to load deaand install wit_tooling

Load packages

Import Python packages that are used for the analysis.

Use standard import commands; some are shown below.

[ ]:
import numpy as np
import pandas as pd
import fiona
import io
from shapely import geometry
[ ]:
# only needed if loading data from database
# and only works on NCI
load_from_db = False
if load_from_db:
    from wit_tooling import query_wit_data
[ ]:
def shape_list(key, values, shapefile):
    """
        Get a generator of shapes from the given shapefile
            key: the key to match in 'properties' in the shape file
            values: a list of property values
            shapefile: the name of your shape file
            e.g. key='ORIGID', values=[1, 2, 3, 4, 5],
            shapefile='/g/data/r78/DEA_Wetlands/shapefiles/MDB_ANAE_Aug2017_modified_2019_SB_3577.shp'
    """
    count = len(values)
    with fiona.open(shapefile) as allshapes:
        for shape in allshapes:
            shape_id = shape['properties'].get(key)
            if shape_id is None:
                continue
            if shape_id in values:
                yield(shape_id, shape)
                count -= 1
            if count <= 0:
                break

def get_areas(features, pkey='SYSID'):
    """
        Calculate the area of a list/generator of shapes
        input:
            features: a list of shapes indexed by the key
        output:
            a dataframe of area index by the key
    """
    re = pd.DataFrame()
    for f in features:
        va = pd.DataFrame([[f[0], geometry.shape(f[1]['geometry']).area/1e4]], columns=[pkey, 'area'])
        re = re.append(va, sort=False)
    print(re)
    return re.set_index(pkey)

def dump_wit_data(key, feature_list, output):
    """
        dump wit data from the database into a file
        input:
            key: Name to id the polygon
            feature_list: a list or generator of features
        output:
            a csv file to save all the wit data
    """
    for f_id, f in feature_list:
        _, wit_data = query_wit_data(f)
        csv_buf = io.StringIO()
        wit_df = pd.DataFrame(data=wit_data, columns=['TIME', 'BS', 'NPV', 'PV', 'WET', 'WATER'])
        wit_df.insert(0, key, f_id)
        wit_df.to_csv(csv_buf, index=False, header=False)
        csv_buf.seek(0)
        with open(output, 'a') as f:
            f.write(csv_buf.read())
    with open(output, 'a') as f:
        f.write(','.join(list(wit_df.columns)))
[ ]:
def annual_metrics(wit_data, members=['PV', 'WET', 'WATER', 'BS', 'NPV', ['NPV', 'PV', 'WET'],
                                          ['PV', 'WET'], ['WATER', 'WET']], threshold=[25, 75], pkey='SYSID'):
    """
        Compute the annual max, min, mean, count with given wit data, members and threshold
        input:
            wit_data: dataframe of WIT
            members: the elements which the metrics are computed against, can be a column from wit_data, e.g. 'PV'
                         or the sum of wit columns, e.g. ['WATER', 'WET']
            threshold: a list of thresholds such that (elements >= threshold[i]) is True,
                        where i = 0, 1...len(threshold)-1
        output:
            dataframe of metrics
    """
    years = wit_data['TIME']
    i = 0
    wit_df = wit_data.copy(deep=True)
    for m in members:
        if isinstance(m, list):
            wit_df.insert(wit_df.columns.size+i, '+'.join(m), wit_df[m].sum(axis=1))
    years = pd.DatetimeIndex(wit_df['TIME']).year.unique()
    shape_id_list = wit_df[pkey].unique()
    wit_metrics = [pd.DataFrame()] * 4
    for y in years:
        wit_yearly = wit_df[pd.DatetimeIndex(wit_df['TIME']).year==y].drop(columns=['TIME']).groupby(pkey).max()
        wit_yearly.insert(0, 'YEAR', y)
        wit_yearly = wit_yearly.rename(columns={n: n+'_max' for n in wit_yearly.columns[1:]})
        wit_metrics[0] = wit_metrics[0].append(wit_yearly, sort=False)
    for y in years:
        wit_yearly = wit_df[pd.DatetimeIndex(wit_df['TIME']).year==y].drop(columns=['TIME']).groupby(pkey).min()
        wit_yearly.insert(0, 'YEAR', y)
        wit_yearly = wit_yearly.rename(columns={n: n+'_min' for n in wit_yearly.columns[1:]})
        wit_metrics[1] = wit_metrics[1].append(wit_yearly, sort=False)
    for y in years:
        wit_yearly = wit_df[pd.DatetimeIndex(wit_df['TIME']).year==y].drop(columns=['TIME']).groupby(pkey).mean()
        wit_yearly.insert(0, 'YEAR', y)
        wit_yearly = wit_yearly.rename(columns={n: n+'_mean' for n in wit_yearly.columns[1:]})
        wit_metrics[2] = wit_metrics[2].append(wit_yearly, sort=False)
    for y in years:
        wit_yearly = wit_df[pd.DatetimeIndex(wit_df['TIME']).year==y][[pkey, 'BS']].groupby(pkey).count()
        wit_yearly.insert(0, 'YEAR', y)
        wit_yearly = wit_yearly.rename(columns={n: 'count' for n in wit_yearly.columns[1:]})
        wit_metrics[3] = wit_metrics[3].append(wit_yearly, sort=False)
    for t in threshold:
        wit_df_ts = wit_df.copy(deep=True)
        wit_metrics += [pd.DataFrame()]
        wit_df_ts.loc[:, wit_df_ts.columns[2:]] = wit_df_ts.loc[:, wit_df_ts.columns[2:]].mask((wit_df_ts[wit_df_ts.columns[2:]] < t/100), np.nan)
        for y in years:
            wit_yearly = wit_df_ts[pd.DatetimeIndex(wit_df_ts['TIME']).year==y].drop(columns=['TIME']).groupby(pkey).count()
            wit_yearly.insert(0, 'YEAR', y)
            wit_yearly = wit_yearly.rename(columns={n: n+'_count'+str(t) for n in wit_yearly.columns[1:]})
            wit_metrics[-1] = wit_metrics[-1].append(wit_yearly, sort=False)
    wit_yearly_metrics = wit_metrics[0]
    for i in range(len(wit_metrics)-1):
        wit_yearly_metrics = pd.merge(wit_yearly_metrics, wit_metrics[i+1], on=[pkey, 'YEAR'], how='inner')
    return wit_yearly_metrics
[ ]:
def get_event_time(wit_ww, threshold, pkey='SYSID'):
    """
        Compute inundation event time by given threshold
        input:
            wit_df: wetness computed from wit data
            threshold: a value such that (WATER+WET > threshold) = inundation
        output:
            dateframe of inundation event time
    """
    i_start = wit_ww[wit_ww['WW'] >= threshold]['TIME'].min()
    if pd.isnull(i_start):
        re = pd.DataFrame([[np.nan] * 4], columns=['start_time', 'end_time', 'duration', 'gap'], index=wit_ww.index.unique())
        re.index.name = pkey
        return re
    re_idx = np.searchsorted(wit_ww[(wit_ww['WW'] < threshold)]['TIME'].values,
                             wit_ww[(wit_ww['WW'] >= threshold)]['TIME'].values)
    re_idx, count = np.unique(re_idx, return_counts=True)
    start_idx = np.zeros(len(count)+1, dtype='int')
    start_idx[1:] = np.cumsum(count)
    re_start = wit_ww[(wit_ww['WW'] >= threshold)].iloc[start_idx[:-1]][['TIME']].rename(columns={'TIME': 'start_time'})
    re_end = wit_ww[(wit_ww['WW'] >= threshold)].iloc[start_idx[1:] - 1][['TIME']].rename(columns={'TIME': 'end_time'})
    re = pd.concat([re_start, re_end], axis=1)
    re.insert(2, 'duration',
              (re['end_time'] - re['start_time'] + np.timedelta64(1, 'D')).astype('timedelta64[D]').astype('timedelta64[D]'))
    re.insert(3, 'gap', np.concatenate([[np.timedelta64(0, 'D')],
                                        (re['start_time'][1:].values - re['end_time'][:-1].values - np.timedelta64(1, 'D')).astype('timedelta64[D]')]))
    re.insert(0, pkey, wit_ww.index.unique()[0])
    return re.set_index(pkey)

def get_im_stats(grouped_wit, im_time, wit_area):
    """
        Get inundation stats given wit data and events
        input:
            grouped_wit: wit data
            im_time: inundation events in time
        output:
            the stats of inundation events
    """
    gid = grouped_wit.index.unique()[0]
    if gid not in im_time.indices.keys():
        return pd.DataFrame([[np.nan]*5], columns=['start_time', 'max_wet', 'mean_wet', 'max_wet_area', 'mean_wet_area'],
                           index=[gid])
    re_left = np.searchsorted(grouped_wit['TIME'].values.astype('datetime64'),
                         im_time.get_group(gid)['start_time'].values, side='left')
    re_right = np.searchsorted(grouped_wit['TIME'].values.astype('datetime64'),
                         im_time.get_group(gid)['end_time'].values, side='right')
    re = pd.DataFrame()
    for a, b in zip(re_left, re_right):
        tmp = pd.concat([grouped_wit.iloc[a:a+1]['TIME'].rename('start_time').astype('datetime64'),
                         pd.Series(grouped_wit.iloc[a:b]['WW'].max(),index=[gid], name='max_wet'),
                         pd.Series(grouped_wit.iloc[a:b]['WW'].mean(),index=[gid], name='mean_wet')],
                        axis=1)
        tmp.insert(3, 'max_wet_area', tmp['max_wet'].values * wit_area[wit_area.index==gid].values)
        tmp.insert(4, 'mean_wet_area', tmp['mean_wet'].values * wit_area[wit_area.index==gid].values)
        re = re.append(tmp, sort=False)
    re.index.name = grouped_wit.index.name
    return re
[ ]:
def event_time(wit_df, threshold=0.01, pkey='SYSID'):
    """
        Compute the inundation events with given wit data and threshold
        input:
            wit_df: wetness computed from wit data
            threshold: a value such that (WATER+WET > threshold) = inundation,
        output:
            dataframe of events
    """
    return wit_df.groupby(pkey).apply(get_event_time, threshold=threshold, pkey=pkey).dropna().droplevel(0)

def event_stats(wit_df, wit_im, wit_area, pkey='SYSID'):
    """
        Compute inundation event stats with given wit wetness, events defined by (start_time, end_time)
        and polygon areas
        input:
            wit_df: wetness computed from wit data
            wit_im: inundation event
            wit_area: polygon areas indexed by the key
        output:
            dataframe of event stats
    """
    grouped_im = wit_im[['start_time', 'end_time']].groupby(pkey)
    return wit_df.groupby(pkey).apply(get_im_stats, im_time=grouped_im, wit_area=wit_area).droplevel(0)

def inundation_metrics(wit_data, wit_area, threshold=0.01, pkey='SYSID'):
    """
        Compute inundation metrics with given wit data, polygon areas and threshold
        input:
            wit_data: a dataframe of wit_data
            wit_area: polygon areas indexed by the key
            threshold: a value such that (WATER+WET > threshold) = inundation
        output:
            dataframe of inundation metrics
    """
    wit_df = wit_data.copy(deep=True)
    wit_df.insert(2, 'WW', wit_df[['WATER', 'WET']].sum(axis=1))
    wit_df = wit_df.drop(columns=wit_df.columns[3:])
    wit_df['TIME'] = wit_df['TIME'].astype('datetime64')
    wit_df = wit_df.set_index(pkey)
    wit_im_time = event_time(wit_df, threshold, pkey)
    wit_im_stats = event_stats(wit_df, wit_im_time, wit_area, pkey)
    return pd.merge(wit_im_time, wit_im_stats.dropna(), on=[pkey, 'start_time'], how='inner')

This section is to compute the metrics

[ ]:
# the files needed, upload the files below to the sandbox if run there
# set the path accordingly
# fid_file: a file with all the ids needed to locate polygons in the shape file
# shapefile: the shape file mentioned above to find the polygons
# wit_data_file: a csv with wit data dumped from the database
fid_file = "/g/data/u46/users/ea6141/wlinsight/Barmah_Ramsar_ANAE_SYSID.csv"
shapefile = '/g/data/r78/DEA_Wetlands/shapefiles/MDB_ANAE_Aug2017_modified_2019_SB_3577.shp'
wit_data_file = '/g/data/u46/users/ea6141/wlinsight/Barmah_Ramsar_wit.csv'
[ ]:
# change pkey to match the key of iding polygon
# e.g. pkey = 'REFCODE' if using the files in example_data
pkey = 'SYSID'
features = shape_list(pkey, pd.read_csv(fid_file, header=None).values, shapefile)
####################################################################################
# This section is to dump the WIT data from database to a csv, DO NOT do this unless:
# * a. you know what it's doing or
# * b. you were told so and
# * c. you have access to the database and
# * d. the notebook runs on NCI
# load all the wit data required from the database and save them to a csv file
if load_from_db:
    dump_wit_data(pkey, features, wit_data_file)
####################################################################################
# get the area of polygons
wit_area = get_areas(features, pkey=pkey)
# load wit data into a pandas dataframe
wit_data = pd.read_csv(wit_data_file, header=None, skipfooter=1,
                       names=[pkey,"TIME","BS","NPV","PV","WET","WATER"])
[39]:
# compute yearly metrics and save the results to a csv
# set the output file to your own path
wit_yearly_metrics = annual_metrics(wit_data, pkey=pkey)
ofn = "/g/data/u46/users/ea6141/wlinsight/Barmah_Ramsar_yearly_metrics.csv"
wit_yearly_metrics.to_csv(ofn)
[40]:
# compute inundation metrics with given metrics
# change the element in threshold_list if after the inundation event defined by different threshold
threshold_list = [0.01, 0.25, 0.75]
for tr in threshold_list:
    print("compute threshold", tr)
    wit_im = inundation_metrics(wit_data, wit_area, tr, pkey=pkey)
    # save the metrics into a csv file
    # set the file to your own path
    ofn = '/g/data/u46/users/ea6141/wlinsight/Barmah_Ramsar_inundation_metrics_T' + "%d"%(tr*100) + '.csv'
    wit_im.to_csv(ofn)
print("all done!!!")
compute threshold 0.01
compute threshold 0.25
compute threshold 0.75
all done!!!

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: March 2021

Compatible datacube version:

N/A

Tags

Browse all available tags on the DEA User Guide’s Tags Index

Tags: NCI compatible, sandbox compatible, landsat 8, time series, WIT