ARD overpass predictor

Sign up to the DEA Sandbox to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with the DEA Sandbox environment
Special requirements: This notebook loads data from an external .csv file (overpass_input.csv) from the Supplementary_data folder of this repository

Background

Knowing the time of a satellite overpass (OP) at a field site is necessary to plan field work activities. While predicting the timesteps for a single field site that receives only one overpass from a given satellite is easy, it gets more complicated when you need overpass information across multiple field sites. For most sites, they will lie in a single descending (daytime) satellite acquisition. For Landsat 8 this occurs every 16 days and for Sentinel 2A / 2B this occurs every 10 days.

This notebook can be used to output a list of ordered overpasses for given field sites by providing an initial timestamp for field sites of interest. Output for multiple sites are ordered by date such that dual overpasses can be identified.

See “overpass_input.csv” as an example input file - this contains a number of field sites, with some that receive multiple acquisitions.

Description

You must provide a start date + time for the overpass of the site you are interested in - go to https://nationalmap.gov.au/ and https://evdc.esa.int/orbit/ to get an overpass for your location (lat/long) and add this to the input file.

Take care with sites that receive multiple acquisitions, ie those that lie in the overlap of 2 acquisitions.

Specify an output file name
Provide an input file - this can be the example included or your own field site information
Specify which field sites receive extra overpasses in an orbital period, ie those which lie in satellite imaging overlap
Specify output field sites - you can select a subset of sites of interest (if adding your site to the original example file)
Combine field site overpasses

If you wish to add or alter sites to predict for, edit the input file located at:

../Supplementary_data/ARD_overpass_predictor/overpass_input.csv

And modify the input file path to reflect. You will need to leave the 1st site in the input file (Mullion) as the notebook requires this to order multiple overpasses.

Getting started

To run the overpass predictor with the given input file, run all cells in the notebook starting with the “Load packages” cell.

Load packages

[1]:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

Specify output file name

This is the name of the .csv the notebook will output. This .csv will contain your combined predictions.

[2]:

# You can rename the .csv to any desired output filename here. Default is "output_overpass_pred.csv"
output_path = '../Supplementary_data/ARD_overpass_predictor/output_overpass_pred.csv'

Specify input file containing initial overpass data for a given site

Read in base overpass file overpass_input.csv

Input date times must be in UTC
Make sure the necessary dates for your site are imported correctly - should be YYYY-MM-DD HH:MM:SS in 24hr format

[3]:

# Read in base file 'overpass_input.csv'
# Edit this file with new field sites, or load your own file with field sites as needed.
# You don't need to input Path / Row or Lat / Long. The notebook only requires start times for the relevant satellites.
base_file_path = '../Supplementary_data/ARD_overpass_predictor/overpass_input.csv'
overpass = pd.read_csv(base_file_path, index_col='Site',
                       parse_dates=['landsat_8', 'sentinel_2a', 'sentinel_2b', 'landsat_8_2', 'sentinel_2a_2', 'sentinel_2b_2'],
                       dayfirst=True)
overpass
#"NaT" values are expected - indicates a date was not entered in input file.

[3]:

	Latitude	Longitude	Path	Row	landsat_8	landsat_8_2	sentinel_2a	sentinel_2a_2	sentinel_2b	sentinel_2b_2
Site
Mullion	-35.123	148.862	91	84	2019-11-29 23:43:00	2019-11-20 23:49:00	2019-12-20 23:58:00	2019-12-14 00:07:00	2019-12-15 23:58:00	2019-12-19 00:07:00
Lake_George	-35.094	149.463	90	84	2019-11-29 23:43:00	NaT	2019-12-20 23:58:00	NaT	2019-12-15 23:58:00	NaT
Narrabundah	-35.334	149.145	90	85	2019-11-29 23:43:00	2019-11-20 23:49:00	2019-12-20 23:58:00	NaT	2019-12-15 23:58:00	NaT

[4]:

# Set a list for field sites included in prediction. If adding to the field sites, please leave the 1st entry, "Mullion",
# as this is used to order secondary overpasses.
sites = list(overpass.index)
sites[0] # outputs 1st site. Should be 'Mullion'

[4]:

'Mullion'

[5]:

# Set satellite timesteps for overpasses
# Depending on your application, you may want to add more accurate timesteps - but note that times used as inputs are
# for the beginning of the acquisition, not the actual overpass time at a specific site.
l8_timestep = timedelta(days=16)
s2a_timestep = timedelta(days=10)
s2b_timestep = timedelta(days=10)

[6]:

# Specify start date
l8_startdate = overpass['landsat_8']
l8_startdate_2 = overpass['landsat_8_2']
sentinel2a_startdate = overpass['sentinel_2a']
sentinel2a_2_startdate = overpass['sentinel_2a_2']
sentinel2b_startdate = overpass['sentinel_2b']
sentinel2b_2_startdate = overpass['sentinel_2b_2']
sentinel2a_startdate

[6]:

Site
Mullion       2019-12-20 23:58:00
Lake_George   2019-12-20 23:58:00
Narrabundah   2019-12-20 23:58:00
Name: sentinel_2a, dtype: datetime64[ns]

Project Times

The following cell calculates times for Landsat and Sentinel overpasses for field sites. This is done by multiplying a number of desired iterations by the overpass frequency.

Overpass frequencies for Landsat 8 = 16 days, Sentinel = 10 days. Increasing the number X in “for i in range(X):” will increase the number of iterations to predict over per satellite. An initial value of 20 iterations is set for L8, and 32 for Sentinel to result in the same total time period (320 days) to predict over.

Predictions are also calculated for secondary overpasses for each site.

[7]:

# Landsat 8 overpass prediction for 20*(overpass frequency) - ie 20*16 = 320 days. You can change this as desired,
# to get overpass predictions for n days. n = x*(OP freq). X is arbitrarily set to 20 to give 320 days as a base example
landsat = list()
for i in range(20):
    landsat.append(l8_startdate + l8_timestep*(i))
landsat = pd.DataFrame(landsat)

# Sentinel 2a overpass prediction for 32 * the overpass frequency, this is to give a similar total time to the L8 prediction.
# OP frequency for Sentinel = 10 days, n days = 32*10 = 320
Sentinel_2A = []
for i in range(32):
    Sentinel_2A.append(sentinel2a_startdate + s2a_timestep * (i))
Sentinel_2A = pd.DataFrame(Sentinel_2A)

# Sentinel 2b
Sentinel_2B = []
for i in range(32):
    Sentinel_2B.append(sentinel2b_startdate + s2b_timestep * (i))

Sentinel_2B = pd.DataFrame(Sentinel_2B)

# Landasat_2
# Prediction for L8 overpasses at sites which are covered by more than 1 overpass in a 16-day period.
# - only Mullion in example .csv
landsat_2 = []
for i in range(20):
    landsat_2.append(l8_startdate_2 + l8_timestep*(i))
landsat_2 = pd.DataFrame(landsat_2)

# Sentinel_2A_2
# Sentinel 2a secondary (Mullion)
Sentinel_2A_2 = []
for i in range(32):
    Sentinel_2A_2.append(sentinel2a_2_startdate + s2a_timestep * (i))
Sentinel_2A_2 = pd.DataFrame(Sentinel_2A_2)

# Sentinel_2B_2
# Sentinel 2b secondary (Lake_George, Narrabundah)
Sentinel_2B_2 = []
for i in range(32):
    Sentinel_2B_2.append(sentinel2b_2_startdate + s2b_timestep * (i))
Sentinel_2B_2 = pd.DataFrame(Sentinel_2B_2)

Combine dataframes for sites that recieve more than 1 overpass in an orbital period.

If adding sites to the input file, please leave in “Mullion” - the 1st entry, as this is used to order secondary overpasses by.

[8]:

# Combine Landsat 8 data (base plus extra overpasses)
L8_combined2 = pd.concat([landsat, landsat_2])
drop_label_L8 = L8_combined2.reset_index(drop=True)
L8_combined2 = drop_label_L8.sort_values(by='Mullion')
L8_combined2.index.names = ['Landsat_8']

# Combine Sentinel 2A data (base plus extra overpasses)
S2A_combined = pd.concat([Sentinel_2A, Sentinel_2A_2])
drop_label_S2A = S2A_combined.reset_index(drop=True)
S2A_combined = drop_label_S2A.sort_values(by='Mullion')
S2A_combined.index.names = ['Sentinel_2A']

# Combine Sentinel 2B data (base plus extra overpasses)
S2B_combined = pd.concat([Sentinel_2B, Sentinel_2B_2])
drop_label_S2B = S2B_combined.reset_index(drop=True)
S2B_combined = drop_label_S2B.sort_values(by='Mullion')
S2B_combined.index.names = ['Sentinel_2B']

# Add satellite label to each entry in "Sat" column
S2A_combined['Sat'] = 'S2A'
S2B_combined['Sat'] = 'S2B'
L8_combined2['Sat'] = 'L8'

combined = pd.concat([S2B_combined, S2A_combined, L8_combined2])
df = pd.DataFrame(combined)

[9]:

# Use a "time dummy" to force a re-order of data by date, to standardise across dataframes.
today = datetime.today()
timedummy = []
t0 = datetime(today.year, today.month, 1)
dummystep = timedelta(days=2)

for i in range(300):
    timedummy.append(t0 + dummystep * (i))

timedummy = pd.DataFrame(timedummy)
df['DateStep'] = timedummy

Output field sites

For new field sites, comment “#” out the old, add the new in the same format.

[10]:

# Create a new df for each field site of interest. To add a new field site, add in same format ie
# "Field_Site = df[['Field_Site', 'Sat', 'DateStep']].copy()" - if you are appending to the original file simply add in "site4 = sites[3]"
# and "Site4 = df[[(site4), 'Sat', 'DateStep']].copy()" in the appropriate section below.
site1 = sites[0]
site2 = sites[1]
site3 = sites[2]
#site4 = sites[3]

Site1 = df[[(site1), 'Sat', 'DateStep']].copy()
Site2 = df[[(site2), 'Sat', 'DateStep']].copy()
Site3 = df[[(site3), 'Sat', 'DateStep']].copy()
#Site4 = df[[(site4), 'Sat', 'DateStep']].copy()

[11]:

# Reorder by date for each site, include satellite tags for each date.
# Add new lines for new sites as needed.
Site1 = (Site1.sort_values(by=[(site1), 'DateStep'])).reset_index(drop=True)
Site2 = (Site2.sort_values(by=[(site2), 'DateStep'])).reset_index(drop=True)
Site3 = (Site3.sort_values(by=[(site3), 'DateStep'])).reset_index(drop=True)
#Site4 = (Site4.sort_values(by=[(site4), 'DateStep'])).reset_index(drop=True)

Optional - output individual field site.

if you wish to output multiple field sites, skip this step.

[12]:

# Use this cell to ouput an individual field site
# Write individual site to excel: take out "#" symbol below, or add new line for new site in this format:

#Site1.to_excel('Site1.csv')
#Site2.to_excel('Site2.csv')

Combine dataset

The overpass predictions are combined and output to the specified “output_path”. Times are in the same timezone as the input file. The example file used is in UTC, but you can also run the notebook on local times.

To convert UTC (or whatever your input time was) to local time, ie AEST (or any other time zone), add 10h to the times or 11 for AEDT. This part is in the below cell, commented out. Remove the #’s to run the command on your desired column, and it will add the timestep.

Yay, you have a list of overpasses!

[13]:

combined_sites = ([Site1, Site2, Site3]) ## If you have more sites, ie you have added 1 or more, add here! An example is provided below if you have a total of 5 sites.
#combined_sites = ([Site1, Site2, Site3, Site4, Site5])

merged = pd.concat(combined_sites, axis=1)
merged = merged.rename_axis("Overpass", axis="columns")
output = merged.drop(['DateStep'], axis=1)

#add timestep: (comment out / delete hash to use as applicable - must include column/site name) - ignore the error
#output['Mullion']=output['Mullion']+datetime.timedelta(hours=10)
#output['Lake_George']=output['Lake_George']+datetime.timedelta(hours=10)
#output['Narrabundah']=output['Narrabundah']+datetime.timedelta(hours=10)

output.to_csv(output_path)
output.head(20)

[13]:

Overpass	Mullion	Sat	Lake_George	Sat	Narrabundah	Sat
0	2019-11-20 23:49:00	L8	2019-11-29 23:43:00	L8	2019-11-20 23:49:00	L8
1	2019-11-29 23:43:00	L8	2019-12-15 23:43:00	L8	2019-11-29 23:43:00	L8
2	2019-12-06 23:49:00	L8	2019-12-15 23:58:00	S2B	2019-12-06 23:49:00	L8
3	2019-12-14 00:07:00	S2A	2019-12-20 23:58:00	S2A	2019-12-15 23:43:00	L8
4	2019-12-15 23:43:00	L8	2019-12-25 23:58:00	S2B	2019-12-15 23:58:00	S2B
5	2019-12-15 23:58:00	S2B	2019-12-30 23:58:00	S2A	2019-12-20 23:58:00	S2A
6	2019-12-19 00:07:00	S2B	2019-12-31 23:43:00	L8	2019-12-22 23:49:00	L8
7	2019-12-20 23:58:00	S2A	2020-01-04 23:58:00	S2B	2019-12-25 23:58:00	S2B
8	2019-12-22 23:49:00	L8	2020-01-09 23:58:00	S2A	2019-12-30 23:58:00	S2A
9	2019-12-24 00:07:00	S2A	2020-01-14 23:58:00	S2B	2019-12-31 23:43:00	L8
10	2019-12-25 23:58:00	S2B	2020-01-16 23:43:00	L8	2020-01-04 23:58:00	S2B
11	2019-12-29 00:07:00	S2B	2020-01-19 23:58:00	S2A	2020-01-07 23:49:00	L8
12	2019-12-30 23:58:00	S2A	2020-01-24 23:58:00	S2B	2020-01-09 23:58:00	S2A
13	2019-12-31 23:43:00	L8	2020-01-29 23:58:00	S2A	2020-01-14 23:58:00	S2B
14	2020-01-03 00:07:00	S2A	2020-02-01 23:43:00	L8	2020-01-16 23:43:00	L8
15	2020-01-04 23:58:00	S2B	2020-02-03 23:58:00	S2B	2020-01-19 23:58:00	S2A
16	2020-01-07 23:49:00	L8	2020-02-08 23:58:00	S2A	2020-01-23 23:49:00	L8
17	2020-01-08 00:07:00	S2B	2020-02-13 23:58:00	S2B	2020-01-24 23:58:00	S2B
18	2020-01-09 23:58:00	S2A	2020-02-17 23:43:00	L8	2020-01-29 23:58:00	S2A
19	2020-01-13 00:07:00	S2A	2020-02-18 23:58:00	S2A	2020-02-01 23:43:00	L8

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on GitHub.

Last modified: December 2023