{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Advanced *Summer Days* calculation with `xarray` using CMIP6 data \n",
    "\n",
    "We will show here how to count the annual summer days for a particular geolocation of your choice using the results of a climate model, in particular, we can chose one of the historical or one of the shared socioeconomic pathway (ssp) experiments of the Coupled Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/).\n",
    "\n",
    "Thanks to the data and computer scientists Marco Kulüke, Fabian Wachsmann, Regina Kwee-Hinzmann, Caroline Arnold, Felix Stiehler, Maria Moreno, and Stephan Kindermann at DKRZ for their contribution to this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this use case you will learn the following:\n",
    "- How to access a dataset from the DKRZ CMIP6 model data archive\n",
    "- How to count the annual number of summer days for a particular geolocation using this model dataset\n",
    "- How to visualize the results\n",
    "\n",
    "\n",
    "You will use:\n",
    "- [Intake](https://github.com/intake/intake) for finding the data in the catalog of the DKRZ archive\n",
    "- [Xarray](http://xarray.pydata.org/en/stable/) for loading and processing the data\n",
    "- [hvPlot](https://hvplot.holoviz.org/index.html) for visualizing the data in the Jupyter notebook and save the plots in your local computer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Load Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np                    # fundamental package for scientific computing\n",
    "import pandas as pd                   # data analysis and manipulation tool\n",
    "import xarray as xr                   # handling labelled multi-dimensional arrays\n",
    "import intake                         # to find data in a catalog, this notebook explains how it works\n",
    "from ipywidgets import widgets        # to use widgets in the Jupyer Notebook\n",
    "from geopy.geocoders import Nominatim # Python client for several popular geocoding web services\n",
    "import folium                         # visualization tool for maps\n",
    "import hvplot.pandas                  # visualization tool for interactive plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Which dataset do we need? -> Choose Shared Socioeconomic Pathway, Place, and Year\n",
    "\n",
    "<a id='selection'></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Produce the widget where we can select what experiment we are interested on \n",
    "\n",
    "#experiments = {'historical':range(1850, 2015), 'ssp126':range(2015, 2101), \n",
    "#               'ssp245':range(2015, 2101), 'ssp370':range(2015, 2101), 'ssp585':range(2015, 2101)}\n",
    "#experiment_box = widgets.Dropdown(options=experiments, description=\"Select  experiment: \", disabled=False,)\n",
    "#display(experiment_box)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Produce the widget where we can select what geolocation and year are interested on \n",
    "\n",
    "#print(\"Feel free to change the default values.\")\n",
    "\n",
    "#place_box = widgets.Text(description=\"Enter place:\", value=\"Hamburg\")\n",
    "#display(place_box)\n",
    "\n",
    "#x = experiment_box.value\n",
    "#year_box = widgets.Dropdown(options=x, description=\"Select year: \", disabled=False,)\n",
    "#display(year_box)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pb=\"Hamburg\"\n",
    "yb=2021\n",
    "eb=\"ssp370\"\n",
    "%store -r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 Find Coordinates of chosen Place\n",
    "If ambiguous, the most likely coordinates will be chosen, e.g. \"Hamburg\" results in \"Hamburg, 20095, Deutschland\", (53.55 North, 10.00 East)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We use the module Nominatim gives us the geographical coordinates of the place we selected above\n",
    "\n",
    "geolocator = Nominatim(user_agent=\"any_agent\")\n",
    "location = geolocator.geocode(pb)\n",
    "\n",
    "print(location.address)\n",
    "print((location.latitude, location.longitude))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Show Place on a Map"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We use the folium package to plot our selected geolocation in a map\n",
    "\n",
    "m = folium.Map(location=[location.latitude, location.longitude])\n",
    "tooltip = location.latitude, location.longitude\n",
    "folium.Marker([location.latitude, location.longitude], tooltip=tooltip).add_to(m)\n",
    "display(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have defined the place and time. Now, we can search for the climate model dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Intake Catalog\n",
    "\n",
    "### 2.1 Load the Intake Catalog"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Path to master catalog on the DKRZ server\n",
    "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n",
    "#\n",
    "#only for the web page we need to take the original link:\n",
    "parent_col=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n",
    "list(parent_col)\n",
    "\n",
    "# Open the catalog with the intake package and name it \"col\" as short for \"collection\"\n",
    "col=parent_col[\"dkrz_cmip6_disk\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Browse the Intake Catalog\n",
    "In this example we chose the Max-Planck Earth System Model in High Resolution Mode (\"MPI-ESM1-2-HR\") and the maximum temperature near surface (\"tasmax\") as variable. We also choose an experiment. CMIP6 comprises several kind of experiments. Each experiment has various simulation members. you can find more information in the [CMIP6 Model and Experiment Documentation](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#5-model-and-experiment-documentation)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Store the name of the model we chose in a variable named \"climate_model\"\n",
    "\n",
    "climate_model = \"MPI-ESM1-2-HR\" # here we choose Max-Plack Institute's Earth Sytem Model in high resolution\n",
    "\n",
    "# This is how we tell intake what data we want\n",
    "\n",
    "query = dict(\n",
    "    source_id      = climate_model, # the model \n",
    "    variable_id    = \"tasmax\", # temperature at surface, maximum\n",
    "    table_id       = \"day\", # daily maximum\n",
    "    experiment_id  = eb, # what we selected in the drop down menu,e.g. SSP2.4-5 2015-2100\n",
    "    member_id      = \"r10i1p1f1\", # \"r\" realization, \"i\" initialization, \"p\" physics, \"f\" forcing\n",
    ")\n",
    "\n",
    "# Intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ\n",
    "col.df[\"uri\"]=col.df[\"uri\"].str.replace(\"lustre/\",\"lustre02/\") \n",
    "cat = col.search(**query)\n",
    "\n",
    "del col\n",
    "\n",
    "# Show query results\n",
    "cat.df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Create Dictionary and get Data into one Object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cdf_kwargs are given to xarray.open_dataset\n",
    "# cftime is like datetime but extends to all four digit years and many calendar types\n",
    "\n",
    "dset_dict = cat.to_dataset_dict(cdf_kwargs={\"chunks\": {\"time\": -1}, \"use_cftime\": True})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get all data into one object\n",
    "\n",
    "for key, value in dset_dict.items():\n",
    "    model = key.split(\".\")[2]  # extract model name from key\n",
    "    tasmax_xr = value[\"tasmax\"].squeeze()  # extract variable from dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tasmax_xr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Select Year and Look at (Meta) Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tasmax_year_xr = tasmax_xr.sel(time=str(yb))\n",
    "\n",
    "# Let's have a look at the xarray data array\n",
    "\n",
    "tasmax_year_xr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see not only the numbers, but also information about it, such as long name, units, and the data history. This information is called metadata."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Compare Model Grid Cell with chosen Location"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find nearest model coordinate by finding the index of the nearest grid point\n",
    "\n",
    "abslat = np.abs(tasmax_year_xr[\"lat\"] - location.latitude)\n",
    "abslon = np.abs(tasmax_year_xr[\"lon\"] - location.longitude)\n",
    "c = np.maximum(abslon, abslat)\n",
    "\n",
    "([xloc], [yloc]) = np.where(c == np.min(c)) # xloc and yloc are the indices of the neares model grid point"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Draw map again\n",
    "\n",
    "m = folium.Map(location=[location.latitude, location.longitude], zoom_start=8)\n",
    "\n",
    "\n",
    "tooltip = location.latitude, location.longitude\n",
    "folium.Marker(\n",
    "    [location.latitude, location.longitude],\n",
    "    tooltip=tooltip,\n",
    "    popup=\"Location selected by You\",\n",
    ").add_to(m)\n",
    "\n",
    "#\n",
    "tooltip = float(tasmax_year_xr[\"lat\"][yloc].values), float(tasmax_year_xr[\"lon\"][xloc].values)\n",
    "folium.Marker(\n",
    "    [tasmax_year_xr[\"lat\"][yloc], tasmax_year_xr[\"lon\"][xloc]],\n",
    "    tooltip=tooltip,\n",
    "    popup=\"Model Grid Cell Center\",\n",
    ").add_to(m)\n",
    "\n",
    "# Define coordinates of model grid cell (just for visualization)\n",
    "rect_lat1_model = (tasmax_year_xr[\"lat\"][yloc - 1] + tasmax_year_xr[\"lat\"][yloc]) / 2\n",
    "rect_lon1_model = (tasmax_year_xr[\"lon\"][xloc - 1] + tasmax_year_xr[\"lon\"][xloc]) / 2\n",
    "rect_lat2_model = (tasmax_year_xr[\"lat\"][yloc + 1] + tasmax_year_xr[\"lat\"][yloc]) / 2\n",
    "rect_lon2_model = (tasmax_year_xr[\"lon\"][xloc + 1] + tasmax_year_xr[\"lon\"][xloc]) / 2\n",
    "\n",
    "# Draw model grid cell\n",
    "folium.Rectangle(\n",
    "    bounds=[[rect_lat1_model, rect_lon1_model], [rect_lat2_model, rect_lon2_model]],\n",
    "    color=\"#ff7800\",\n",
    "    fill=True,\n",
    "    fill_color=\"#ffff00\",\n",
    "    fill_opacity=0.2,\n",
    ").add_to(m)\n",
    "\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Climate models have a finite resolution. Hence, models do not provide the data of a particular point, but the mean over a model grid cell. Take this in mind when comparing model data with observed data (e.g. weather stations).\n",
    "\n",
    "\n",
    "Now, we will visualize the daily maximum temperature time series of the model grid cell."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Draw Temperature Time Series and Count Summer days"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The definition of a summer day varies from region to region. According to the [German Weather Service](https://www.dwd.de/EN/ourservices/germanclimateatlas/explanations/elements/_functions/faqkarussel/sommertage.html), \"a summer day is a day on which the maximum air temperature is at least 25.0 °C\". Depending on the place you selected, you might want to apply a different threshold to calculate the summer days index. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tasmax_year_place_xr = tasmax_year_xr[:, yloc, xloc] - 273.15 # Convert Kelvin to °C\n",
    "tasmax_year_place_df = pd.DataFrame(index = tasmax_year_place_xr['time'].values, \n",
    "                                    columns = ['Temperature', 'Summer Day Threshold']) # create the dataframe\n",
    "\n",
    "tasmax_year_place_df.loc[:, 'Model Temperature'] = tasmax_year_place_xr.values # insert model data into the dataframe\n",
    "tasmax_year_place_df.loc[:, 'Summer Day Threshold'] = 25 # insert the threshold into the dataframe\n",
    "\n",
    "# Plot data and define title and legend\n",
    "tasmax_year_place_df.hvplot.line(y=['Model Temperature', 'Summer Day Threshold'], \n",
    "                                 value_label='Temperature in °C', legend='bottom', \n",
    "                                 title='Daily maximum Temperature near Surface for '+pb, \n",
    "                                 height=500, width=620)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, the maximum daily temperature is highly variable over the year. As we are using the mean temperature in a model grid cell, the amount of summer days might we different that what you would expect at a single location."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Summer days index calculation\n",
    "no_summer_days_model = tasmax_year_place_xr[tasmax_year_place_xr.load() > 25].size # count the number of summer days\n",
    "\n",
    "# Print results in a sentence\n",
    "print(\"According to the German Weather Service definition, in the \" +eb +\" experiment the \" \n",
    "      +climate_model +\" model shows \" +str(no_summer_days_model) +\" summer days for \" + pb \n",
    "      + \" in \" + str(yb) +\".\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Try another location and year](#selection)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}