# Combining satellite data with tidal modelling using OTPS¶

**Compatability:**Notebook currently compatible with both the`NCI`

and`DEA Sandbox`

environments**Products used:**ga_ls5t_ard_3, ga_ls7e_ard_3, ga_ls8c_ard_3

## Background¶

Ocean tides are the periodic rise and fall of the ocean caused by the gravitational pull of the moon and sun and the earth’s rotation. Tides in coastal areas can greatly influence how these environments appear in satellite imagery as water levels vary by up to 12 metres (e.g. in northern Australia). To be able to study environmental processes along Australia’s coastline, it is vital to obtain data on tidal conditions at the exact moment each satellite image was acquired.

## Description¶

This notebooks demonstrates how to tidally tag remotely sensed imagery using functions from the dea_coastaltools script so that images can be extracted or analysed by tidal stage (e.g. low, high, ebb, flow). These functions use the OTPS TPXO8 tidal model to calculate the height (relative to mean sea level, i.e. approximately equivalent to the Australian Height Datum or AHD) and stage of the tide at the exact moment each satellite image was acquired. This tidal model was previously used to produce DEA datasets including the Intertidal Extents Model (ITEM), High-Low Tide Composites (HLTC), and the National Intertidal Digital Elevation Model (NIDEM).

The notebook demonstrates how to:

Load an example time series of satellite data

Use the

`tidal_tag`

function from`dea_coastaltools`

to model tide heights for each satellite observationUse tide height data to produce median composites of the coast at low and high tide

Swap a dataset’s dimensions to compute a rolling median along the

`tide_height`

dimensionCompute ebb or flow tide phase data to determine whether water levels were rising or falling in each satellite observation

Use the

`tidal_stats`

function to evaluate any biases in the tidal conditions observed by a satellite

## Getting started¶

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

### Load packages¶

```
[1]:
```

```
%matplotlib inline
import sys
import datacube
import xarray as xr
import matplotlib.pyplot as plt
sys.path.append('../Scripts')
from dea_plotting import rgb
from dea_plotting import display_map
from dea_datahandling import load_ard
from dea_coastaltools import tidal_tag
from dea_coastaltools import tidal_stats
```

### Connect to the datacube¶

```
[2]:
```

```
# Temporary solution to account for Collection 3 data being in a different
# database on the NCI
try:
dc = datacube.Datacube(app='Tidal_modelling', env='c3-samples')
except:
dc = datacube.Datacube(app='Tidal_modelling')
```

### Set up data query¶

First we set up a query to define the area, time period and other parameters required for loading data. In this example, we will load 30 years of Landsat 5, 7 and 8 data for an intertidal beach near West Hill Island south of Mackay in Queensland. We load the `'nbart_red', 'nbart_green', 'nbart_blue'`

bands so that we can plot the data as true colour imagery.

The

`dask_chunks`

parameter allows us to use Dask to lazily load data rather than load data directly into memory, which can take a long time and large amounts of memory. Lazy loading can be a very useful approach for when you need to load large amounts of data without crashing your analysis. In coastal applications, it allows us to load (using either`.compute()`

or by plotting our data) only a small subset of observations from our entire time series (e.g. only low or high tide observations) without having to load the entire dataset into memory first, which can greatly decrease processing times.For more information about using Dask, refer to the Parallel processing with Dask notebook.

```
[3]:
```

```
# Set up data load query
query = {'x': (149.4, 149.53),
'y': (-21.71, -21.85),
'time': ('1988', '2018'),
'measurements': ['nbart_red', 'nbart_green', 'nbart_blue'],
'output_crs': 'EPSG:32755',
'resolution': (-30, 30),
'group_by': 'solar_day',
'dask_chunks': {}}
```

We can preview the area that we will load data for:

```
[4]:
```

```
display_map(x=query['x'], y=query['y'])
```

```
/usr/local/lib/python3.6/dist-packages/pyproj/crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
return _prepare_from_string(" ".join(pjargs))
/usr/local/lib/python3.6/dist-packages/pyproj/crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
return _prepare_from_string(" ".join(pjargs))
```

```
[4]:
```

## Load satellite time-series¶

To obtain some satellite data to analyse, we use the `load_ard`

import a time series of Landsat 5, 7 and 8 observations as an `xarray.Dataset`

. The input data does not need to be from Landsat: any remotely-sensed imagery with timestamps and spatial coordinates provide enough data to run the tidal model.

```
[5]:
```

```
# Load available data from all three Landsat satellites
ds = load_ard(dc=dc,
products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'],
ls7_slc_off=False,
**query)
# Print output data
print(ds)
```

```
Loading ga_ls5t_ard_3 data
Applying pixel quality/cloud mask
Applying invalid data mask
Applying contiguity mask
Loading ga_ls7e_ard_3 data
Ignoring SLC-off observations for ls7
Applying pixel quality/cloud mask
Applying invalid data mask
Applying contiguity mask
Loading ga_ls8c_ard_3 data
Applying pixel quality/cloud mask
Applying invalid data mask
Applying contiguity mask
Combining and sorting data
Returning 945 observations as a dask array
<xarray.Dataset>
Dimensions: (time: 945, x: 458, y: 525)
Coordinates:
* x (x) float64 7.48e+05 7.481e+05 ... 7.617e+05 7.617e+05
* y (y) float64 7.597e+06 7.597e+06 ... 7.582e+06 7.582e+06
* time (time) datetime64[ns] 1988-01-06T23:27:58.301764 ... 2018-12-26T23:58:56.635504
Data variables:
nbart_red (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_green (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_blue (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
Attributes:
crs: EPSG:32755
```

## Model tide heights for each observation¶

We use the `tidal_tag`

function from `dea_coastaltools`

to associate each satellite observation in our timeseries with a tide height relative to mean sea level (i.e. approximately equivalent to the Australian Height Datum or AHD). This function uses the time and date of acquisition and the geographic location of each satellite observation as inputs to the OSU Tidal Prediction Software (OTPS) tidal model. From Sagar et al.
2015:

The

OTPS TPX08tidal model consists of a multi-resolution bathymetric grid solution, with a 1/6° solution in the global open ocean, and a 1/30° local resolution solution to improve modelling in complex shallow water environments. The OTPS model is based on a system of linear partial differential equations, called Laplace’s tidal equations, parametrised with nine harmonic tidal constituents. The model is fitted to track-averaged TOPEX/Poseidon altimeter data collected from 1992 to 2016 and Jason-1 (Poseidon 2) altimeter data from 2002 to 2013, enabling estimation of the tidal height and harmonic constituents at discrete temporal epochs and spatial locations.

```
[6]:
```

```
# Model tide heights
ds_tidal = tidal_tag(ds)
# Print output data
print(ds_tidal)
```

```
Setting tide modelling location from dataset centroid: 149.47, -21.78
<xarray.Dataset>
Dimensions: (time: 945, x: 458, y: 525)
Coordinates:
* x (x) float64 7.48e+05 7.481e+05 ... 7.617e+05 7.617e+05
* y (y) float64 7.597e+06 7.597e+06 ... 7.582e+06 7.582e+06
* time (time) datetime64[ns] 1988-01-06T23:27:58.301764 ... 2018-12-26T23:58:56.635504
Data variables:
nbart_red (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_green (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_blue (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
tide_height (time) float64 -0.924 1.373 -2.41 3.192 ... -0.812 1.614 -2.192
Attributes:
crs: EPSG:32755
```

The function will automatically select a tide modelling location based on the dataset centroid. It will then output modelled tide heights as a new `tide_height`

variable in the `xarray.Dataset`

(the variable should appear under `Data variables`

above).

We can easily plot this new variable to inspect the range of tide heights observed by the satellites in our timeseries. In this example, our observed tide heights range from approximately -3.0 to 4.0 m relative to Mean Sea Level:

```
[7]:
```

```
ds_tidal.tide_height.plot(linewidth=0.5)
```

```
[7]:
```

```
[<matplotlib.lines.Line2D at 0x7fe837fe5f98>]
```

### Example tide height analysis¶

To demonstrate how tidally tagged images can be used to produce composites of high and low tide imagery, we can compute the lowest 5% and highest 5% percent of tide heights, and use these to filter our observations. We can then combine and plot these filtered observations to visualise what the landscape looks like at low and high tide:

```
[8]:
```

```
# Calculate the lowest and highest 5% of tides
lowest_5, highest_5 = ds_tidal.tide_height.quantile([0.05, 0.95]).values
# Filter our data to low and high tide observations
filtered_low = ds_tidal.where(ds_tidal.tide_height <= lowest_5, drop=True)
filtered_high = ds_tidal.where(ds_tidal.tide_height >= highest_5, drop=True)
# Take the simple median of each set of low and high tide observations to
# produce a composite (alternatively, observations could be combined
# using a geomedian to keep band relationships consistent)
median_low = filtered_low.median(dim='time', keep_attrs=True)
median_high = filtered_high.median(dim='time', keep_attrs=True)
# Combine low and high tide medians into a single dataset and give
# each layer a meaningful name
ds_highlow = xr.concat([median_low, median_high], dim='tide_height')
ds_highlow['tide_height'] = ['Low tide', 'High tide']
# Plot low and high tide medians side-by-side
rgb(ds_highlow, col='tide_height')
```

### Swapping dimensions¶

The `tidal_tag`

function allows you to use tide heights as the primary dimension in the dataset, rather than time. Setting `swap_dims=True`

will swap the `time`

dimension in the original `xarray.Dataset`

to the new `tide_height`

variable.

```
[9]:
```

```
# Model tide heights
ds_tidal = tidal_tag(ds, swap_dims=True)
# Print output data
print(ds_tidal)
```

```
Setting tide modelling location from dataset centroid: 149.47, -21.78
<xarray.Dataset>
Dimensions: (tide_height: 945, x: 458, y: 525)
Coordinates:
* x (x) float64 7.48e+05 7.481e+05 ... 7.617e+05 7.617e+05
* y (y) float64 7.597e+06 7.597e+06 ... 7.582e+06 7.582e+06
* tide_height (tide_height) float64 -2.932 -2.849 -2.777 ... 4.355 4.394
Data variables:
nbart_red (tide_height, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_green (tide_height, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_blue (tide_height, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
Attributes:
crs: EPSG:32755
```

The dataset now contains three dimensions: `tide_height`

, `x`

and `y`

. This can make it easier to analyse the data with respect to tide, e.g. computing a rolling median by tide height (e.g. along the `tide_height`

dimension):

```
[10]:
```

```
# First we need to update the chunks used by Dask to allow us to do a
# rolling median without having to load all our data into memory first
ds_rechunked = ds_tidal.chunk(chunks={'tide_height': 15})
# Compute a rolling median that will go through every satellite
# observation, and take the median of that timestep and its 15 neighbours
ds_rolling = ds_rechunked.rolling(tide_height=15,
center=True,
min_periods=1).median()
# Plot the lowest, 500th and highest tide rolling median image
rgb(ds_rolling, index_dim='tide_height', index=[0, 500, -1])
```

## Modelling ebb and flow tidal phases¶

The `tidal_tag`

function also allows us to determine whether each satellite observation was taken while the tide was rising/incoming (flow tide) or falling/outgoing (ebb tide) by setting `ebb_flow=True`

. This is achieved by comparing tide heights 15 minutes before the before and after the observed satellite observation.

Ebb and flow data can provide valuable contextual information for interpreting satellite imagery, particularly in tidal flat or mangrove forest environments where water may remain in the landscape for considerable time after the tidal peak.

```
[11]:
```

```
# Model tide heights
ds_tidal = tidal_tag(ds, ebb_flow=True)
# Print output data
print(ds_tidal)
```

```
Setting tide modelling location from dataset centroid: 149.47, -21.78
Modelling tidal phase (e.g. ebb or flow)
<xarray.Dataset>
Dimensions: (time: 945, x: 458, y: 525)
Coordinates:
* x (x) float64 7.48e+05 7.481e+05 ... 7.617e+05 7.617e+05
* y (y) float64 7.597e+06 7.597e+06 ... 7.582e+06 7.582e+06
* time (time) datetime64[ns] 1988-01-06T23:27:58.301764 ... 2018-12-26T23:58:56.635504
Data variables:
nbart_red (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_green (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
nbart_blue (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray>
tide_height (time) float64 -0.924 1.373 -2.41 3.192 ... -0.812 1.614 -2.192
ebb_flow (time) <U4 'Flow' 'Ebb' 'Flow' 'Ebb' ... 'Flow' 'Ebb' 'Flow'
Attributes:
crs: EPSG:32755
```

We now have data giving us the both the tide height and tidal phase (‘ebb’ or ‘flow’) for every satellite image:

```
[12]:
```

```
ds_tidal[['time', 'tide_height', 'ebb_flow']].to_dataframe()
```

```
[12]:
```

tide_height | ebb_flow | |
---|---|---|

time | ||

1988-01-06 23:27:58.301764 | -0.924 | Flow |

1988-01-13 23:34:17.335745 | 1.373 | Ebb |

1988-01-22 23:28:16.039754 | -2.410 | Flow |

1988-01-29 23:34:35.316447 | 3.192 | Ebb |

1988-02-14 23:34:51.644339 | 4.031 | Flow |

... | ... | ... |

2018-11-24 23:58:59.776266 | 0.743 | Flow |

2018-12-02 00:05:09.575190 | 1.068 | Ebb |

2018-12-10 23:58:56.697928 | -0.812 | Flow |

2018-12-18 00:05:07.146347 | 1.614 | Ebb |

2018-12-26 23:58:56.635504 | -2.192 | Flow |

945 rows × 2 columns

We could for example use this data to filter our observations to keep ebbing phase observations only:

```
[13]:
```

```
ds_tidal.where(ds_tidal.ebb_flow == 'Ebb', drop=True)
```

```
[13]:
```

<xarray.Dataset> Dimensions: (time: 420, x: 458, y: 525) Coordinates: * x (x) float64 7.48e+05 7.481e+05 ... 7.617e+05 7.617e+05 * y (y) float64 7.597e+06 7.597e+06 ... 7.582e+06 7.582e+06 * time (time) datetime64[ns] 1988-01-13T23:34:17.335745 ... 2018-12-18T00:05:07.146347 Data variables: nbart_red (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray> nbart_green (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray> nbart_blue (time, y, x) float32 dask.array<chunksize=(1, 525, 458), meta=np.ndarray> tide_height (time) float64 1.373 3.192 1.685 2.65 ... -0.458 1.068 1.614 ebb_flow (time) object 'Ebb' 'Ebb' 'Ebb' 'Ebb' ... 'Ebb' 'Ebb' 'Ebb' Attributes: crs: EPSG:32755

## Evaluating observed vs. all modelled tide heights¶

The complex behaviour of tides mean that a sun synchronous sensor like Landsat does not observe the full range of the tidal cycle at all locations. Biases in the proportion of the tidal range observed by satellites can prevent us from obtaining data on areas of the coastline exposed or inundated at the extremes of the tidal range. This can risk gaining misleading insights into the true extent of the area of the coastline affected by tides, and make it difficult to compare high or low tide images fairly in different locations.

The `tidal_stats`

function can assist in evaluating how the range of tides observed by satellites compare to the full tidal range. It works by using the OTPS tidal model to model tide heights at a regular interval (every two hours by default) across the entire time period covered by the input satelliter timeseries dataset. This is then compared against the tide heights in observed by the satellite and used to calculate a range of statistics and a plot that summarises potential biases in the
data.

For a more detailed discussion of the issue of tidal bias in sun-synchronous satellite observations of the coastline, refer to the ‘Limitations and future work’ section in Bishop-Taylor et al. 2018.

```
[14]:
```

```
out_stats = tidal_stats(ds)
```

```
Setting tide modelling location from dataset centroid: 149.47, -21.78
82% of the full 8.98 m modelled tidal range is observed at this location.
The lowest 13% and highest 5% of tides are never observed.
Observed tides do not increase or decrease significantly over the ~32 year period.
All tides do not increase or decrease significantly over the ~32 year period.
```

The function also outputs a `pandas.Series`

object containing a set of statistics that compare the observed vs. full modelled tidal ranges. These statistics include:

**tidepost_lat:**latitude used for modelling tide heights**tidepost_lon:**longitude used for modelling tide heights**observed_min_m:**minimum tide height observed by the satellite (in metre units)**all_min_m:**minimum tide height from full modelled tidal range (in metre units)**observed_max_m:**maximum tide height observed by the satellite (in metre units)**all_max_m:**maximum tide height from full modelled tidal range (in metre units)**observed_range_m:**tidal range observed by the satellite (in metre units)**all_range_m:**full modelled tidal range (in metre units)**spread_m:**proportion of the full modelled tidal range observed by the satellite (see Bishop-Taylor et al. 2018)**low_tide_offset:**proportion of the lowest tides never observed by the satellite (see Bishop-Taylor et al. 2018)**high_tide_offset:**proportion of the highest tides never observed by the satellite (see Bishop-Taylor et al. 2018)**observed_slope:**slope of any relationship between observed tide heights and time**all_slope:**slope of any relationship between all modelled tide heights and time**observed_pval:**significance/p-value of any relationship between observed tide heights and time**all_pval:**significance/p-value of any relationship between all modelled tide heights and time

```
[15]:
```

```
print(out_stats)
```

```
tidepost_lat -21.780
tidepost_lon 149.465
observed_mean_m 0.503
all_mean_m -0.000
observed_min_m -2.932
all_min_m -4.138
observed_max_m 4.394
all_max_m 4.841
observed_range_m 7.326
all_range_m 8.979
spread 0.816
low_tide_offset 0.134
high_tide_offset 0.050
observed_slope 0.003
all_slope -0.000
observed_pval 0.641
all_pval 0.581
dtype: float64
```

## Additional information¶

**License:** The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

**Contact:** If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the `open-data-cube`

tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.

**Last modified:** April 2020

**Compatible datacube version:**

```
[16]:
```

```
print(datacube.__version__)
```

```
1.7+254.ge8d0a0c4.dirty
```

## Tags¶

Browse all available tags on the DEA User Guide’s Tags Index

**Tags**: sandbox compatible, NCI compatible, landsat 5, landsat 7, landsat 8, dea_coastaltools, dea_datahandling, dea_plotting, display_map, load_ard, rgb, tidal_tag, tidal_stats, tide modelling, intertidal, Dask, lazy loading, rolling window