{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Xarray \n", "\n", "* **[Sign up to the DEA Sandbox](https://app.sandbox.dea.ga.gov.au/)** to run this notebook interactively from a browser\n", "* **Compatibility**: Notebook currently compatible with both the `NCI` and `DEA Sandbox` environments\n", "* **Prerequisites**: Users of this notebook should have a basic understanding of:\n", " * How to run a [Jupyter notebook](01_Jupyter_notebooks.ipynb)\n", " * How to work with [Numpy](07_Intro_to_numpy.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Background\n", "`Xarray` is a python library which simplifies working with labelled multi-dimension arrays. \n", "`Xarray` introduces labels in the forms of dimensions, coordinates and attributes on top of raw `numpy` arrays, allowing for more intitutive and concise development. \n", "More information about `xarray` data structures and functions can be found [here](http://xarray.pydata.org/en/stable/).\n", "\n", "Once you've completed this notebook, you may be interested in advancing your `xarray` skills further, this [external notebook](https://rabernat.github.io/research_computing/xarray.html) introduces more uses of `xarray` and may help you advance your skills further. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Description\n", "This notebook is designed to introduce users to `xarray` using Python code in Jupyter Notebooks via JupyterLab.\n", "\n", "Topics covered include:\n", "\n", "* How to use `xarray` functions in a Jupyter Notebook cell\n", "* How to access `xarray` dimensions and metadata\n", "* Using indexing to explore multi-dimensional `xarray` data\n", "* Appliction of built-in `xarray` functions such as sum, std and mean\n", "\n", "***\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting started\n", "To run this notebook, run all the cells in the notebook starting with the \"Load packages\" cell. For help with running notebook cells, refer back to the [Jupyter Notebooks notebook](01_Jupyter_notebooks.ipynb). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import xarray as xr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction to xarray\n", "DEA uses `xarray` as its core data model. \n", "To better understand what it is, let's first do a simple experiment using a combination of plain `numpy` arrays and Python dictionaries.\n", "\n", "Suppose we have a satellite image with three bands: `Red`, `NIR` and `SWIR`. \n", "These bands are represented as 2-dimensional `numpy` arrays and the latitude and longitude coordinates for each dimension are represented using 1-dimensional arrays. \n", "Finally, we also have some metadata that comes with this image. \n", "The code below creates fake satellite data and structures the data as a `dictionary`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Create fake satellite data\n", "red = np.random.rand(250, 250)\n", "nir = np.random.rand(250, 250)\n", "swir = np.random.rand(250, 250)\n", "\n", "# Create some lats and lons\n", "lats = np.linspace(-23.5, -26.0, num=red.shape[0], endpoint=False)\n", "lons = np.linspace(110.0, 112.5, num=red.shape[1], endpoint=False)\n", "\n", "# Create metadata\n", "title = \"Image of the desert\"\n", "date = \"2019-11-10\"\n", "\n", "# Stack into a dictionary\n", "image = {\n", " \"red\": red,\n", " \"nir\": nir,\n", " \"swir\": swir,\n", " \"latitude\": lats,\n", " \"longitude\": lons,\n", " \"title\": title,\n", " \"date\": date,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All our data is conveniently packed in a dictionary. Now we can use this dictionary to work with the data it contains:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2019-11-10\n" ] }, { "data": { "text/plain": [ "0.5010298934189398" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Date of satellite image\n", "print(image[\"date\"])\n", "\n", "# Mean of red values\n", "image[\"red\"].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Still, to select data we have to use `numpy` indexes. \n", "Wouldn't it be convenient to be able to select data from the images using the coordinates of the pixels instead of their relative positions? \n", "This is exactly what `xarray` solves! Let's see how it works:\n", "\n", "To explore `xarray` we have a file containing some surface reflectance data extracted from the DEA platform. \n", "The object that we get `ds` is a `xarray` `Dataset`, which in some ways is very similar to the dictionary that we created before, but with lots of convenient functionality available." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset>\n", "Dimensions: (time: 12, x: 483, y: 601)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2018-01-03T08:31:05 ... 2018-02-27T08:...\n", " * y (y) float64 -2.519e+06 -2.519e+06 ... -2.507e+06 -2.507e+06\n", " * x (x) float64 2.378e+06 2.378e+06 ... 2.388e+06 2.388e+06\n", " spatial_ref int32 6933\n", "Data variables:\n", " red (time, y, x) uint16 ...\n", " green (time, y, x) uint16 ...\n", " blue (time, y, x) uint16 ...\n", "Attributes:\n", " crs: EPSG:6933\n", " grid_mapping: spatial_ref
array(['2018-01-03T08:31:05.000000000', '2018-01-08T08:34:01.000000000',\n", " '2018-01-13T08:30:41.000000000', '2018-01-18T08:30:42.000000000',\n", " '2018-01-23T08:33:58.000000000', '2018-01-28T08:30:20.000000000',\n", " '2018-02-07T08:30:53.000000000', '2018-02-12T08:31:43.000000000',\n", " '2018-02-17T08:23:09.000000000', '2018-02-17T08:35:40.000000000',\n", " '2018-02-22T08:34:52.000000000', '2018-02-27T08:31:36.000000000'],\n", " dtype='datetime64[ns]')
array([-2519270., -2519250., -2519230., ..., -2507310., -2507290., -2507270.])
array([2378390., 2378410., 2378430., ..., 2387990., 2388010., 2388030.])
array(6933, dtype=int32)
[3483396 values with dtype=uint16]
[3483396 values with dtype=uint16]
[3483396 values with dtype=uint16]
<xarray.DataArray 'green' (time: 12, y: 601, x: 483)>\n", "[3483396 values with dtype=uint16]\n", "Coordinates:\n", " * time (time) datetime64[ns] 2018-01-03T08:31:05 ... 2018-02-27T08:...\n", " * y (y) float64 -2.519e+06 -2.519e+06 ... -2.507e+06 -2.507e+06\n", " * x (x) float64 2.378e+06 2.378e+06 ... 2.388e+06 2.388e+06\n", " spatial_ref int32 6933\n", "Attributes:\n", " units: 1\n", " nodata: 0\n", " crs: EPSG:6933\n", " grid_mapping: spatial_ref
[3483396 values with dtype=uint16]
array(['2018-01-03T08:31:05.000000000', '2018-01-08T08:34:01.000000000',\n", " '2018-01-13T08:30:41.000000000', '2018-01-18T08:30:42.000000000',\n", " '2018-01-23T08:33:58.000000000', '2018-01-28T08:30:20.000000000',\n", " '2018-02-07T08:30:53.000000000', '2018-02-12T08:31:43.000000000',\n", " '2018-02-17T08:23:09.000000000', '2018-02-17T08:35:40.000000000',\n", " '2018-02-22T08:34:52.000000000', '2018-02-27T08:31:36.000000000'],\n", " dtype='datetime64[ns]')
array([-2519270., -2519250., -2519230., ..., -2507310., -2507290., -2507270.])
array([2378390., 2378410., 2378430., ..., 2387990., 2388010., 2388030.])
array(6933, dtype=int32)
<xarray.DataArray 'time' (time: 12)>\n", "array(['2018-01-03T08:31:05.000000000', '2018-01-08T08:34:01.000000000',\n", " '2018-01-13T08:30:41.000000000', '2018-01-18T08:30:42.000000000',\n", " '2018-01-23T08:33:58.000000000', '2018-01-28T08:30:20.000000000',\n", " '2018-02-07T08:30:53.000000000', '2018-02-12T08:31:43.000000000',\n", " '2018-02-17T08:23:09.000000000', '2018-02-17T08:35:40.000000000',\n", " '2018-02-22T08:34:52.000000000', '2018-02-27T08:31:36.000000000'],\n", " dtype='datetime64[ns]')\n", "Coordinates:\n", " * time (time) datetime64[ns] 2018-01-03T08:31:05 ... 2018-02-27T08:...\n", " spatial_ref int32 6933
array(['2018-01-03T08:31:05.000000000', '2018-01-08T08:34:01.000000000',\n", " '2018-01-13T08:30:41.000000000', '2018-01-18T08:30:42.000000000',\n", " '2018-01-23T08:33:58.000000000', '2018-01-28T08:30:20.000000000',\n", " '2018-02-07T08:30:53.000000000', '2018-02-12T08:31:43.000000000',\n", " '2018-02-17T08:23:09.000000000', '2018-02-17T08:35:40.000000000',\n", " '2018-02-22T08:34:52.000000000', '2018-02-27T08:31:36.000000000'],\n", " dtype='datetime64[ns]')
array(['2018-01-03T08:31:05.000000000', '2018-01-08T08:34:01.000000000',\n", " '2018-01-13T08:30:41.000000000', '2018-01-18T08:30:42.000000000',\n", " '2018-01-23T08:33:58.000000000', '2018-01-28T08:30:20.000000000',\n", " '2018-02-07T08:30:53.000000000', '2018-02-12T08:31:43.000000000',\n", " '2018-02-17T08:23:09.000000000', '2018-02-17T08:35:40.000000000',\n", " '2018-02-22T08:34:52.000000000', '2018-02-27T08:31:36.000000000'],\n", " dtype='datetime64[ns]')
array(6933, dtype=int32)
<xarray.DataArray 'green' (y: 601, x: 483)>\n", "array([[1214, 1232, 1406, ..., 3436, 4252, 4300],\n", " [1214, 1334, 1378, ..., 2006, 2602, 4184],\n", " [1274, 1340, 1554, ..., 2436, 1858, 1890],\n", " ...,\n", " [1142, 1086, 1202, ..., 1096, 1074, 1092],\n", " [1188, 1258, 1190, ..., 1058, 1138, 1138],\n", " [1152, 1134, 1074, ..., 1086, 1116, 1100]], dtype=uint16)\n", "Coordinates:\n", " time datetime64[ns] 2018-01-03T08:31:05\n", " * y (y) float64 -2.519e+06 -2.519e+06 ... -2.507e+06 -2.507e+06\n", " * x (x) float64 2.378e+06 2.378e+06 ... 2.388e+06 2.388e+06\n", " spatial_ref int32 6933\n", "Attributes:\n", " units: 1\n", " nodata: 0\n", " crs: EPSG:6933\n", " grid_mapping: spatial_ref
array([[1214, 1232, 1406, ..., 3436, 4252, 4300],\n", " [1214, 1334, 1378, ..., 2006, 2602, 4184],\n", " [1274, 1340, 1554, ..., 2436, 1858, 1890],\n", " ...,\n", " [1142, 1086, 1202, ..., 1096, 1074, 1092],\n", " [1188, 1258, 1190, ..., 1058, 1138, 1138],\n", " [1152, 1134, 1074, ..., 1086, 1116, 1100]], dtype=uint16)
array('2018-01-03T08:31:05.000000000', dtype='datetime64[ns]')
array([-2519270., -2519250., -2519230., ..., -2507310., -2507290., -2507270.])
array([2378390., 2378410., 2378430., ..., 2387990., 2388010., 2388030.])
array(6933, dtype=int32)
<xarray.DataArray 'green' (time: 1, y: 601, x: 483)>\n", "array([[[1270, 1280, ..., 4228, 3950],\n", " [1266, 1332, ..., 3880, 4372],\n", " ...,\n", " [1172, 1180, ..., 1154, 1190],\n", " [1242, 1204, ..., 1192, 1170]]], dtype=uint16)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2018-01-08T08:34:01\n", " * y (y) float64 -2.519e+06 -2.519e+06 ... -2.507e+06 -2.507e+06\n", " * x (x) float64 2.378e+06 2.378e+06 ... 2.388e+06 2.388e+06\n", " spatial_ref int32 6933\n", "Attributes:\n", " units: 1\n", " nodata: 0\n", " crs: EPSG:6933\n", " grid_mapping: spatial_ref
array([[[1270, 1280, ..., 4228, 3950],\n", " [1266, 1332, ..., 3880, 4372],\n", " ...,\n", " [1172, 1180, ..., 1154, 1190],\n", " [1242, 1204, ..., 1192, 1170]]], dtype=uint16)
array(['2018-01-08T08:34:01.000000000'], dtype='datetime64[ns]')
array([-2519270., -2519250., -2519230., ..., -2507310., -2507290., -2507270.])
array([2378390., 2378410., 2378430., ..., 2387990., 2388010., 2388030.])
array(6933, dtype=int32)
<xarray.DataArray 'green' (time: 1, y: 601, x: 101)>\n", "array([[[1270, 1280, ..., 1416, 1290],\n", " [1266, 1332, ..., 1368, 1274],\n", " ...,\n", " [1172, 1180, ..., 1086, 991],\n", " [1242, 1204, ..., 1019, 986]]], dtype=uint16)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2018-01-08T08:34:01\n", " * y (y) float64 -2.519e+06 -2.519e+06 ... -2.507e+06 -2.507e+06\n", " * x (x) float64 2.378e+06 2.378e+06 2.378e+06 ... 2.38e+06 2.38e+06\n", " spatial_ref int32 6933\n", "Attributes:\n", " units: 1\n", " nodata: 0\n", " crs: EPSG:6933\n", " grid_mapping: spatial_ref
array([[[1270, 1280, ..., 1416, 1290],\n", " [1266, 1332, ..., 1368, 1274],\n", " ...,\n", " [1172, 1180, ..., 1086, 991],\n", " [1242, 1204, ..., 1019, 986]]], dtype=uint16)
array(['2018-01-08T08:34:01.000000000'], dtype='datetime64[ns]')
array([-2519270., -2519250., -2519230., ..., -2507310., -2507290., -2507270.])
array([2378390., 2378410., 2378430., 2378450., 2378470., 2378490., 2378510.,\n", " 2378530., 2378550., 2378570., 2378590., 2378610., 2378630., 2378650.,\n", " 2378670., 2378690., 2378710., 2378730., 2378750., 2378770., 2378790.,\n", " 2378810., 2378830., 2378850., 2378870., 2378890., 2378910., 2378930.,\n", " 2378950., 2378970., 2378990., 2379010., 2379030., 2379050., 2379070.,\n", " 2379090., 2379110., 2379130., 2379150., 2379170., 2379190., 2379210.,\n", " 2379230., 2379250., 2379270., 2379290., 2379310., 2379330., 2379350.,\n", " 2379370., 2379390., 2379410., 2379430., 2379450., 2379470., 2379490.,\n", " 2379510., 2379530., 2379550., 2379570., 2379590., 2379610., 2379630.,\n", " 2379650., 2379670., 2379690., 2379710., 2379730., 2379750., 2379770.,\n", " 2379790., 2379810., 2379830., 2379850., 2379870., 2379890., 2379910.,\n", " 2379930., 2379950., 2379970., 2379990., 2380010., 2380030., 2380050.,\n", " 2380070., 2380090., 2380110., 2380130., 2380150., 2380170., 2380190.,\n", " 2380210., 2380230., 2380250., 2380270., 2380290., 2380310., 2380330.,\n", " 2380350., 2380370., 2380390.])
array(6933, dtype=int32)