Advanced Querying

More advanced queries can be done to filter based on product lineage.

Requirements:

You need to run the following commands from the command line prior to launching jupyter notebook from the same terminal so that the required libraries and paths are set.

module use /g/data/v10/public/modules/modulefiles

module load dea

[1]:
%matplotlib inline
import warnings; warnings.simplefilter('ignore')

import datacube
dc = datacube.Datacube(app='advance-query-example')

We can query the metadata for data that has been indexed but is not stored in the system, such as the Level 1 product.

Level 1 refers to a sensor radiance product, before it has been corrected for atmosphere.

[2]:
scenes = dc.find_datasets(product='ls5_level1_scene', time=('1995-1-4', '1995-1-5'))
scenes
[2]:
[Dataset <id=4fda1bcf-afcd-4cf4-bb6c-0cceae55d75e type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_089_080_19950104/ga-metadata.yaml>,
 Dataset <id=1cd1ca62-49e7-4b4f-90b8-5f5304b5e5b0 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_089_082_19950104/ga-metadata.yaml>,
 Dataset <id=939e0fbe-9db6-4718-a04f-e571daa1ad30 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_080_19950105/ga-metadata.yaml>,
 Dataset <id=204c71b0-49ed-4860-a1dd-d86d47ab201b type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_081_19950105/ga-metadata.yaml>,
 Dataset <id=75c4a0f1-cff4-4f43-bc18-5b57061cd2d4 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_077_19950105/ga-metadata.yaml>,
 Dataset <id=9e67989a-1240-4ea2-ab83-3ca32be2fc40 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_079_19950105/ga-metadata.yaml>,
 Dataset <id=e7ddda30-a4eb-4b22-88df-2462bee526b5 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_083_19950105/ga-metadata.yaml>,
 Dataset <id=34ffecf0-b4ca-4fe5-a669-e55deed12016 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_082_19950105/ga-metadata.yaml>,
 Dataset <id=fb83f06f-1ae7-4bf4-b89c-deeb6a02962e type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_084_19950105/ga-metadata.yaml>,
 Dataset <id=b9d6536a-1ec4-4710-a6cd-a16e60ebfae1 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_078_19950105/ga-metadata.yaml>,
 Dataset <id=f70e339a-9573-493b-8fca-32780b7048a4 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_076_19950105/ga-metadata.yaml>,
 Dataset <id=61504c2c-97fe-4ffe-bded-5ef6e6878edc type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_064_19950105/ga-metadata.yaml>,
 Dataset <id=22022044-03e5-40f8-813a-af066a415143 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_074_19950105/ga-metadata.yaml>,
 Dataset <id=31983f2e-26ba-47ff-ad57-c906f436c97c type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_065_19950105/ga-metadata.yaml>,
 Dataset <id=0675b7c1-df4e-4a5b-83b3-0ece3f32b32f type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_066_19950105/ga-metadata.yaml>,
 Dataset <id=89502f70-6c18-4253-9e22-9e3ef89aa2cd type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_075_19950105/ga-metadata.yaml>,
 Dataset <id=f9a8827f-5236-4649-a08b-074980b4a1de type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_072_19950105/ga-metadata.yaml>,
 Dataset <id=f120065d-79b6-4ff6-ac72-4916e6281dba type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_071_19950105/ga-metadata.yaml>,
 Dataset <id=0bebcdd8-143d-41b4-b65d-e17e173f5014 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_070_19950105/ga-metadata.yaml>,
 Dataset <id=44653db9-bd4c-4046-910a-25975c5e7c26 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_096_073_19950105/ga-metadata.yaml>,
 Dataset <id=de2b16b6-fa65-46ce-bc47-867d9ee60cce type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_089_078_19950104/ga-metadata.yaml>,
 Dataset <id=b7d5defa-d7d9-44bf-a7dc-5bd09c250795 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_089_081_19950104/ga-metadata.yaml>,
 Dataset <id=a84f0eb3-e65f-44e9-b1f2-88d9b50a8c61 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_089_089_19950104/ga-metadata.yaml>,
 Dataset <id=7440e581-3d31-42b7-96be-c8fcfaef11bd type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_089_090_19950104/ga-metadata.yaml>,
 Dataset <id=5127d289-06ea-4a93-9111-733a540a08d4 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_089_079_19950104/ga-metadata.yaml>,
 Dataset <id=6cff7653-8bbe-4584-b250-dbd8f4e1e0e8 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_066_19950104/ga-metadata.yaml>,
 Dataset <id=3aa84abb-e893-4fa0-a97d-f2b01d030a9d type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_079_19950104/ga-metadata.yaml>,
 Dataset <id=449bd577-4b54-4b75-b890-3275a4f381ad type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_075_19950104/ga-metadata.yaml>,
 Dataset <id=97430ed0-cea1-4011-a137-c14f701c99cc type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_074_19950104/ga-metadata.yaml>,
 Dataset <id=0d9a1254-943b-42b1-8c04-d1eb8c333a8f type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_078_19950104/ga-metadata.yaml>,
 Dataset <id=f18bbbdc-9f35-4a4e-a96b-a501e65c2ed1 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_077_19950104/ga-metadata.yaml>,
 Dataset <id=b10be1db-1e84-4a57-bc5a-e658ef485c59 type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_076_19950104/ga-metadata.yaml>,
 Dataset <id=0ceeeb5f-5ce7-4df8-97fa-7131c422966b type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_081_19950104/ga-metadata.yaml>,
 Dataset <id=2242455e-83b3-4a54-a551-bc5f5fd3d3fd type=ls5_level1_scene location=/g/data/v10/reprocess/ls5/level1/1995/01/LS5_TM_OTH_P51_GALPGS01-002_114_080_19950104/ga-metadata.yaml>]

There is scene-level metadata that we may want to use.

Landsat follows the WRS2 orbit, which gets broken up into paths and rows.

GQA stands for Geometric Quality Assesment, which is how well the image aligns with known points on the ground. We will use the CEP90 field, which measures the Circular Error.

[3]:
for scene in scenes:
    path = min(scene.metadata.sat_path)
    row =  min(scene.metadata.sat_row)
    gqa = scene.metadata.gqa_cep90
    print("path: {},\t row: {},\t error: {}".format(path,row, gqa))
path: 89,        row: 80,        error: 0.75
path: 89,        row: 82,        error: 0.92
path: 96,        row: 80,        error: 0.28
path: 96,        row: 81,        error: 0.48
path: 96,        row: 77,        error: 0.14
path: 96,        row: 79,        error: 0.44
path: 96,        row: 83,        error: 0.27
path: 96,        row: 82,        error: 0.23
path: 96,        row: 84,        error: 0.64
path: 96,        row: 78,        error: 0.26
path: 96,        row: 76,        error: 0.13
path: 96,        row: 64,        error: nan
path: 96,        row: 74,        error: 0.6
path: 96,        row: 65,        error: nan
path: 96,        row: 66,        error: nan
path: 96,        row: 75,        error: 0.19
path: 96,        row: 72,        error: 0.3
path: 96,        row: 71,        error: 0.75
path: 96,        row: 70,        error: 0.31
path: 96,        row: 73,        error: 0.22
path: 89,        row: 78,        error: 0.93
path: 89,        row: 81,        error: 0.96
path: 89,        row: 89,        error: 0.45
path: 89,        row: 90,        error: 0.59
path: 89,        row: 79,        error: 3.76
path: 114,       row: 66,        error: nan
path: 114,       row: 79,        error: 0.43
path: 114,       row: 75,        error: 4.66
path: 114,       row: 74,        error: 0.43
path: 114,       row: 78,        error: 0.4
path: 114,       row: 77,        error: 0.3
path: 114,       row: 76,        error: 0.29
path: 114,       row: 81,        error: 1.07
path: 114,       row: 80,        error: 0.66

If we only care about highly-aligned data, we can filter based on the gqa:

[4]:
scenes = dc.find_datasets(product='ls5_level1_scene', time=('1995-1-4', '1995-1-5'), gqa_cep90=(0, 0.75))

for scene in scenes:
    path = min(scene.metadata.sat_path)
    row =  min(scene.metadata.sat_row)
    gqa = scene.metadata.gqa_cep90
    print("path: {},\t row: {},\t error: {}".format(path,row, gqa))
path: 96,        row: 80,        error: 0.28
path: 96,        row: 81,        error: 0.48
path: 96,        row: 77,        error: 0.14
path: 96,        row: 79,        error: 0.44
path: 96,        row: 83,        error: 0.27
path: 96,        row: 82,        error: 0.23
path: 96,        row: 84,        error: 0.64
path: 96,        row: 78,        error: 0.26
path: 96,        row: 76,        error: 0.13
path: 96,        row: 74,        error: 0.6
path: 96,        row: 75,        error: 0.19
path: 96,        row: 72,        error: 0.3
path: 96,        row: 70,        error: 0.31
path: 96,        row: 73,        error: 0.22
path: 89,        row: 89,        error: 0.45
path: 89,        row: 90,        error: 0.59
path: 114,       row: 79,        error: 0.43
path: 114,       row: 74,        error: 0.43
path: 114,       row: 78,        error: 0.4
path: 114,       row: 77,        error: 0.3
path: 114,       row: 76,        error: 0.29
path: 114,       row: 80,        error: 0.66

We can also query based on the path and row:

[5]:
scenes = dc.find_datasets(product='ls5_level1_scene', time=('1995-1-4', '1995-1-5'), sat_path=114, sat_row=(77, 80))

for scene in scenes:
    path = min(scene.metadata.sat_path)
    row =  min(scene.metadata.sat_row)
    gqa = scene.metadata.gqa_cep90
    print("path: {},\t row: {},\t error: {}".format(path,row, gqa))
path: 114,       row: 77,        error: 0.3
path: 114,       row: 79,        error: 0.43
path: 114,       row: 78,        error: 0.4

Using lineage

While we can query this metadata, we really want to use the available data product derived from this data, which is ls5_nbar_albers.

[6]:
datasets = dc.find_datasets(product='ls5_nbar_albers', time=('1995-1-4', '1995-1-5'))
len(datasets)
[6]:
272
[7]:
d = dc.index.datasets.get(datasets[0].id, True)
[8]:
def print_sources(d):
    print(d.type.name, ':', d.id)
    if d.sources is not None:
        for k, v in d.sources.items():
            print_sources(v)

print_sources(d)
ls5_nbar_albers : 5ce94ca0-3911-4e33-8fa3-f9db0c250e04
ls5_nbar_scene : 2e310aad-5c9d-4398-b5ba-978601aab0ed
ls5_level1_scene : 0ceeeb5f-5ce7-4df8-97fa-7131c422966b
ls5_satellite_telemetry_data : c8af077c-8353-11e5-8bc4-ac162d791418

We can see from the lineage tree that the nbar albers product is derived from the nbar scene, which was produced from the Level 1 scene.

To define a filter on the lineage tree, that dataset must have a parent with those properties:

[9]:
scene_filter = dict(product='ls5_level1_scene',
                     time=('1995-1-4', '1995-1-5'),
                     sat_path=114, sat_row=(77, 79),
                    )
[10]:
datasets = dc.find_datasets(product='ls5_nbar_albers', time=('1995-1-4', '1995-1-5'), source_filter=scene_filter)
len(datasets)
[10]:
18

We can see the number of returned datasets is smaller, with only the datasets that fall in the given path/rows.

This will then limit the datasets that are used for data when used in a load function:

[11]:
data = dc.load(product='ls5_nbar_albers', time=('1995-1-4', '1995-1-5'),
               group_by='solar_day',
               source_filter=scene_filter,
               dask_chunks={'time': 1, 'x': 4000, 'y': 4000}
              )
[12]:
data.isel(time=0).red[::50, ::50].plot();
../../_images/notebooks_01_Getting_started_Advanced_Querying_21_0.png

Here we can see that we only have data from the requested scenes, without specifying spatial dimensions.