dea_tools.dask

Tools for simplifying the creation of Dask clusters for parallelised computing.

License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel (http://slack.opendatacube.org/) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).

If you would like to report an issue with this script, you can file one on GitHub (GeoscienceAustralia/dea-notebooks#new).

Last modified: June 2022

Functions

create_dask_gateway_cluster([profile, workers])

Create a cluster in our internal dask cluster.

create_local_dask_cluster([spare_mem, ...])

Using the datacube utils function start_local_dask, generate a local dask cluster.

dea_tools.dask.create_dask_gateway_cluster(profile='r5_L', workers=2)[source]

Create a cluster in our internal dask cluster.

Parameters:
  • profile (str) –

    Possible values are:
    • r5_L (2 cores, 15GB memory)

    • r5_XL (4 cores, 31GB memory)

    • r5_2XL (8 cores, 63GB memory)

    • r5_4XL (16 cores, 127GB memory)

  • workers (int) – Number of workers in the cluster.

dea_tools.dask.create_local_dask_cluster(spare_mem='3Gb', display_client=True, return_client=False)[source]

Using the datacube utils function start_local_dask, generate a local dask cluster. Automatically detects if on AWS or NCI.

Example use :

import sys sys.path.append(“../Scripts”) from dea_dask import create_local_dask_cluster

create_local_dask_cluster(spare_mem=’4Gb’)

Parameters:
  • spare_mem (String, optional) – The amount of memory, in Gb, to leave for the notebook to run. This memory will not be used by the cluster. e.g ‘3Gb’

  • display_client (Bool, optional) – An optional boolean indicating whether to display a summary of the dask client, including a link to monitor progress of the analysis. Set to False to hide this display.

  • return_client (Bool, optional) – An optional boolean indicating whether to return the dask client object.