{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Advanced quantile calculation for the MPI-ESM1-2-LR CMIP6 ensemble with **xarray**, **pandas** and **hvplot**\n", "\n", "We will show how to calculate statistics of the MPI-ESM1-2-LR ensemble published within the Coupled Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/). Quantiles of a single-model ensemble allow to estimate the simulated climate variability. This notebook will calculate 5 quantiles for a specified region on the globe and compares them with the global quantiles. Interactive plots will be used for data visualization for easy interpretation.\n", "\n", "We will choose one variable of multiple experiments and compare the results of different models. In particular, we analyse the historical experiment in combination with one of the shared socioeconomic pathway (ssp) experiments. \n", "\n", "This Jupyter notebook is meant to run in the [Jupyterhub](https://jupyterhub.dkrz.de/hub/login?next=%2Fhub%2Fhome) server of the German Climate Computing Center [DKRZ](https://www.dkrz.de/). The DKRZ hosts the CMIP data pool including 4 petabytes of CMIP6 data. Please, choose the Python 3 unstable kernel on the Kernel tab above, it contains all the common geoscience packages. See more information on how to run Jupyter notebooks at DKRZ [here](https://www.dkrz.de/up/systems/mistral/programming/jupyter-notebook).\n", "\n", "Running this Jupyter notebook in your premise, which is also known as [client-side](https://en.wikipedia.org/wiki/Client-side) computing, will require that you install the necessary packages and download data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning Objectives\n", "\n", "- How to access a dataset from the DKRZ CMIP data pool with `intake-esm`\n", "- How to use `xarray` and `pandas` for data analysis\n", "- How to visualize the results with `hvplot`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Requirements\n", "\n", "- About 10GB memory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import intake\n", "import pandas as pd\n", "import hvplot.pandas\n", "import hvplot.xarray\n", "import numpy as np\n", "import cartopy.crs as ccrs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters\n", "\n", "Along with the *variable_id*, we can specify\n", "\n", "- the number of ensemble members `ens_size` that we want to take into account of our analysis\n", "- a latitude and longitude combination for our region for comparing with global climate variability" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Choose one of\n", "# pr, psl, tas, tasmax, tasmin, clt\n", "variable_id = \"tas\"\n", "# 1-30\n", "ens_size=29\n", "# lat and lon in deg\n", "regions=dict(\n", " user_region=dict(\n", " latitudes=[45,55],\n", " longitudes=[0,10]\n", " )\n", ")\n", "%store -r" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "regions[\"global\"]=dict(\n", " latitudes=[-90,90],\n", " longitudes=[0,360]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get formating done automatically according to style `black`\n", "#%load_ext lab_black" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialization\n", "\n", "The `intake-esm` software reads *Catalogs* which we use to **find, access and load** the data we are interested in. Daily updated CMIP6 catalogs are provided in DKRZ's cloud [swift](https://swiftbrowser.dkrz.de/public/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Path to master catalog on the DKRZ server\n", "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", "#\n", "#only for the web page we need to take the original link:\n", "parent_col=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "list(parent_col)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col=parent_col[\"dkrz_cmip6_disk\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are searching for the ensemble of the ESM *MPI-ESM1-2-LR* for the time span 1850 to 2100. We use the *historical* and *ssp585* experiments for the analysis. For now, we are averaging variables from the table Amon which contains atmospheric (**A**mon) submodel output in monthly (A**mon**) frequency." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "query = dict(\n", " variable_id=variable_id,\n", " table_id=\"Amon\",\n", " experiment_id=[\"historical\", \"ssp585\"], # we have excluded \"ssp245\" from the list because it would take 15min to finish the nb\n", " source_id=[\"MPI-ESM1-2-LR\"]#, \"AWI-CM-1-1-MR\"],\n", ")\n", "col.df[\"uri\"]=col.df[\"uri\"].str.replace(\"lustre/\",\"lustre02/\") \n", "cat = col.search(**query)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We open our selection and hold them as `xarray` datasets:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Opening the results as xarray dsets...\")\n", "xr_dsets=cat.to_dataset_dict(progressbar=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Post-processing\n", "\n", "The main post-processing step will be done in the `spatial_weighted_yearly_mean` function. It contains:\n", "\n", "1. Select a region from the globally model data\n", "1. Calculate a weighted spatial mean. The weights are only valid for a regular shaped lon-lat grid.\n", "1. Average over years\n", "\n", "This will be done for the provided *data set, variable_id, lats and lons*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def spatial_weighted_yearly_mean(dset, variable_id, lats, lons):\n", " \n", " var = dset.get(variable_id)\n", " # Var zoomed:\n", " var = var.sel(lat=slice(lats[0],lats[1]),\n", " lon=slice(lons[0],lons[1]))\n", " \n", " # Get weights\n", " weights = np.cos(np.deg2rad(dset.lat))\n", " # Var weighted\n", " varwg = var.weighted(weights)\n", " # Var global mean:\n", " vargm = varwg.mean((\"lon\", \"lat\"))\n", " # Var yearly mean:\n", " vargmym = vargm.groupby(\"time.year\").mean(\"time\")\n", " return vargmym.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Calculate reference climate for anomalies\n", "\n", "As a refrence period, we define the 1971 to 2000 time interval. This is only available in the *historical* experiment.\n", "We calculate a 30-yr mean of *spatial weighted yearly mean* arrays of the target variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def calculate_historical_gmym(xr_dsets, variable_id, lats, lons):\n", " historical = [key for key in xr_dsets.keys() if \"historical\" in key][0]\n", " dset_hist = xr_dsets[historical]\n", " dset_hist_thirty = dset_hist.sel(time=dset_hist.time.dt.year.isin(range(1971, 2000)))\n", " # 10member\n", " hist_gmym = spatial_weighted_yearly_mean(dset_hist_thirty, variable_id, lats, lons)\n", " return hist_gmym.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arrange anomalies results in a DataFrame\n", "\n", "For an area plot, `hvplot` works best on `DataFrame`s - the `xarray`-extension of `hvplot` converts the data array into a data frame for an area plot anyway but takes way longer. So it is decided to create a table with\n", "\n", "- *years* from 1850 to 2100 as index\n", "- *ensembles* as columns\n", "\n", "This table is filled with the anomaly of the *spatial weighted yearly means* of our target variable in comparison with the 30yr historical reference period 1971-2000." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def calculate_vargmym(xr_dsets, hist_gmymref, variable_id, lats, lons):\n", " years = [i for i in range(1850,2101)]\n", " vargmympd = pd.DataFrame(index=years, columns=[\"r\"+str(i) for i in range(ens_size)], dtype=float)\n", " vargmympd.index.name = \"Year\"\n", " for key in xr_dsets:\n", " datatoappend = spatial_weighted_yearly_mean(xr_dsets[key], variable_id, lats,lons) - hist_gmymref\n", " insert_years = list(xr_dsets[key].get(variable_id).groupby(\"time.year\").groups.keys())\n", " for i in range(ens_size):\n", " vargmympd.loc[insert_years, \"r\"+str(i)] = datatoappend[i]\n", " return vargmympd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting\n", "\n", "#### Areas between quantiles and line for median\n", "\n", "The final plot should contain **two areas** and **one line**.\n", "\n", "- One area spans the interval between the first and 99th quantile i.e. it contains 98% of all values of the ensemble\n", "- One area spans the interval between the 25th and 75th quantile i.e. it contains 50% of all values of the ensemble\n", "- The line corresponds to the median i.e. it shows the 50th quantile\n", "\n", "With `hvplot`, we can generate three individual plots and combine them with `*`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_quantiles_area(quantiles,ylabel):\n", " ninetyeight=quantiles.hvplot.area(grid=True,\n", " y=\"0.01\",\n", " y2=\"0.99\",\n", " width=820,\n", " color=\"aliceblue\",\n", " label=\"98% of all values\")\n", " fifty=quantiles.hvplot.area(grid=True,\n", " y=\"0.25\",\n", " y2=\"0.75\",\n", " width=820,\n", " color=\"aqua\",\n", " label=\"50% of all values\")\n", " median=quantiles.hvplot.line(grid=True,\n", " y=\"0.5\",\n", " color=\"cornflowerblue\",\n", " label=\"median\")\n", " comb=(ninetyeight*fifty*median).opts(ylabel=ylabel, legend_position='bottom_right')\n", " return comb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Axis label for plotting\n", "\n", "We can get metadata for the chosen variable from the files so that the script can generate an axis label automatically. The metadata is stored in a `dict`-like `.attrs` variable of the dataset. The function `get_label` returns the fitting label:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_label(xr_dset, variable_id):\n", " lname = xr_dset.get(variable_id).attrs[\"long_name\"]\n", " units = xr_dset.get(variable_id).attrs[\"units\"]\n", " return \"Delta \" + lname + \"[\" + units + \"]\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running the workflow\n", "\n", "For both regions, the user specified one and the global comparison, we run\n", "1. Calculation of a 30-yr mean historical reference for anomaly calculation\n", "1. Calculation of the anomalies for each year and realization spatially weighted and averaged over the region\n", "1. Calculation of the quantiles\n", "1. Creation of the plots" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plots=[]\n", "quantiles=[]\n", "for k,region in regions.items():\n", " lats=region[\"latitudes\"]\n", " lons=region[\"longitudes\"]\n", " ylabel=get_label(xr_dsets[list(xr_dsets.keys())[0]], variable_id)\n", " label=\"lats: \"+str(lats[0])+\"-\"+str(lats[1])\n", " print(\"1. Calculate historical reference...\")\n", " hist_gmymref = calculate_historical_gmym(xr_dsets, variable_id, lats, lons)\n", " print(\"2. Calculate spatial and yearly mean of variable \"+variable_id+\" ...\")\n", " var_gmym= calculate_vargmym(xr_dsets, hist_gmymref, variable_id, lats, lons)\n", " print(\"3. Calculate quantiles ...\")\n", " quantile=var_gmym.transpose().quantile([0.01, 0.25, 0.5, 0.75, 0.99]).transpose()\n", " quantiles.append(quantile)\n", " print(\"4. Creating plots and diagnostic ...\")\n", " plots.append(plot_quantiles_area(quantile, ylabel))\n", " plots.append((quantile[0.99]-quantile[0.01]).hvplot.line(\n", " color=\"red\",\n", " grid=True,\n", " ylabel=ylabel,\n", " label=\"0.99-0.01 Quantile, \"+label\n", " ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Displaying the result\n", "\n", "We combine all plots into one with one column. The first two plots show the results for the region specifed by the user, the second two for the global mean." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(plots[0]+plots[1]+plots[2]+plots[3]).cols(1)" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "The variability of the *regional* average of the yearly mean temperature is about 2-3 °C for mid-latitudes close to Europe while there is less variability in the global mean. 28 out of 30 members of the MPI-ESM1-2-LR indicate that, in a *ssp585 world*,\n", "\n", "- Europeans will not experience another colder-than-usual year with yearly mean temperature smaller than the historical mean **from about 2030 onwards**.\n", "- The date when we shifted european climate to a new state which does not produce yearly temperatures which we are used to experience is **from about 2068**\n", "- Europeans will not experience another year with yearly mean temperature smaller than 2°C above the historical mean **from 2075 onwards**\n", "\n", "The variability of the *global* average of the yearly mean temperature is in a narrow band with 0.5 deg C width.\n", "\n", "- no clear change of variability in the climate state transition in the *ssp585* scenario\n", "- a 2°C warmer world is reached in **year 2066**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add-on\n", "\n", "Calculating quantiles with `xarray` is also easy. The results can be visualized easily with hvplot.\n", "\n", "In the following, we show you how to do that with the historical dataset. We calculate the 98% interval between the 99th and 1st quantile and plot a *quadmesh* plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historical_ds=xr_dsets[list(xr_dsets.keys())[0]]\n", "historical_ds=historical_ds.chunk(dict(member_id=-1)).groupby(\"time.year\").mean(\"time\") #.isel(time=slice(0,120)).mean(dim=\"time\")\n", "xr_quantiles=historical_ds.quantile(q=[0.01,0.99],dim=\"member_id\")\n", "xr_quantiles" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "qs=xr_quantiles.sel(quantile=0.99)-xr_quantiles.sel(quantile=0.01).squeeze()\n", "qs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#qs.tas.hvplot.contourf(levels=20, geo=True, coastline=True, width=600)\n", "qs.tas.hvplot.quadmesh(width=600)\n", "#qs.hvplot.quadmesh(\n", "# 'lon', 'lat', 'tas', projection=ccrs.Orthographic(-90, 30),\n", "# global_extent=True, frame_height=540, cmap='viridis'#, coastline=True\n", "#)\n", "#hvplot.save(testplot, \"testplot.html\")" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "Variability of the yearly mean temperature varies with location. The yearly mean temperature can be simulated more precisely i.e. varies less\n", "\n", "- above oceans than above land\n", "- in tropical than in polar regions" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "In a next step, these results should be validated. How well simulated is the climate variability?\n", "\n", "- compare it with reanalysis statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Used data\n", "\n", "- https://doi.org/10.22033/ESGF/CMIP6.6595\n", "- https://doi.org/10.22033/ESGF/CMIP6.6705" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We acknowledge the CMIP community for providing the climate model data, retained and globally distributed in the framework of the ESGF. The CMIP data of this study were replicated and made available for this study by the DKRZ.”" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "python3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 4 }