Calculate the number of observations that went into detecting each waterbody

  • Compatability: Notebook currently compatible with the NCI environment only. You can make this notebook Sandbox compatible by pointing it to the DEA Waterbodies timeseries located in AWS.

  • Products used: None.

  • Prerequisites: This notebook explores the individual waterbody timeseries csvs contained within the DEA Waterbodies dataset. It has been designed with that very specific purpose in mind, and is not intended as a general analysis notebook.

Description

This notebook loops through all of the individal waterbodies timeseries produced within DEA Waterbodies, and generates statistics on the number of observations within each of the individual records.

  1. Load the required python modules

  2. Set up the directory where the timeseries data are all located

  3. Glob through that directory to get a list of all the files to loop through

  4. Loop through each file and make a note of its length

  5. Calculate length statistics


Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

Import Python packages that are used for the analysis.

[1]:
%matplotlib inline
import matplotlib.pyplot as plt

import xarray as xr
import pandas as pd
import glob
Populating the interactive namespace from numpy and matplotlib

Analysis parameters

  • TimeseriesDir: Folder where the DEA Waterbodies timeseries are saved

  • AnalysisStartDate: e.g. '1985-01-01'. Date to start counting observations from. The dataset begins in 1987. If you want to select a shorter date range over which to count observations, set this data to your custom range.

  • AnalysisEndDate: e.g. '2019-01-01'. Final date to finish counting observations. The dataset is being continually updated. If you want to select a shorter date range over which to count observations, set this data to your custom range.

[2]:
TimeseriesDir = '/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid'

AnalysisStartDate = '1985-01-01'
AnalysisEndDate = '2019-01-01'

Get a list of all of the csv files

[3]:
CSVFiles = glob.glob(f'{TimeseriesDir}/**/*.csv', recursive=True)

Open each file, then work out how many observations occur between the observation period

[4]:
AllObs = []
for FileName in CSVFiles:
    try:
        TimeHistory = pd.read_csv(FileName)
        TimeHistory['Observation Date'] = pd.to_datetime(TimeHistory['Observation Date'])
        NumObs = len(TimeHistory[(TimeHistory['Observation Date'] > AnalysisStartDate) &
                                 (TimeHistory['Observation Date'] < AnalysisEndDate)])
        if NumObs < 50:
            print(FileName)
        AllObs.append(NumObs)
    except:
        print(FileName +' did not work')
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhvj/rhvj0znm9.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhg9/rhg9rbvhc.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r7cd/r7cd57z1h.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1r1/r1r1hjuyv.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r764/r7648hec6.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r30n/r30nf45jq.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r5mw/r5mwkrvkb.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rk22/rk22bz0n6.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r6up/r6upjhtyq.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1cv/r1cvw9450.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r5er/r5er1jbfq.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rh38/rh38exyjw.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r7g5/r7g5z931y.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1wk/r1wk02791.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/qvut/qvut32ubz.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r280/r280t15z8.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/qvzf/qvzfjh7uf.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/qv6p/qv6p0jr15.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r5ec/r5eczv3vx.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r6sq/r6sq7bke5.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1xr/r1xr43j7d.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r4z9/r4z9h4k55.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r74d/r74dznf97.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rjn4/rjn4r6nh3.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1k2/r1k2rccu9.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rj2h/rj2httbdp.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r5u3/r5u3juecs.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhut/rhute581e.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r1qz/r1qz238cv.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/rhrz/rhrzd1f33.csv did not work
/g/data/r78/cek156/dea-notebooks/Scientific_workflows/DEAWaterbodies/timeseries_aus_uid/r38p/r38pwsvwh.csv did not work

Calculate some statistics on observation length

You can edit these cells to generate different length statistics.

[5]:
AllObs.sort()
AllObsNP = np.array(AllObs)
[15]:
plt.hist(AllObsNP, bins=20)
plt.xlabel(f'Number of Observations')
plt.title(f'Number of Observations between {AnalysisStartDate} and {AnalysisEndDate} \n'
          'for individual DEA Waterbodies')
[15]:
Text(0.5, 1.0, 'Number of Observations between 1985-01-01 and 2019-01-01 \nfor individual DEA Waterbodies')

Interrogate the length some more

You can change the statistic here depending on what you’re interested in.

[16]:
AllObsNP.min()
[16]:
402

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: January 2020

Compatible datacube version: N/A

Tags

Browse all available tags on the DEA User Guide’s Tags Index

Tags: DEA Waterbodies