Advanced Summer Days calculation with xarray
using CMIP6 data#
We will show here how to count the annual summer days for a particular geolocation of your choice using the results of a climate model, in particular, we can chose one of the historical or one of the shared socioeconomic pathway (ssp) experiments of the Coupled Model Intercomparison Project CMIP6.
Thanks to the data and computer scientists Marco Kulüke, Fabian Wachsmann, Regina Kwee-Hinzmann, Caroline Arnold, Felix Stiehler, Maria Moreno, and Stephan Kindermann at DKRZ for their contribution to this notebook.
In this use case you will learn the following:
How to access a dataset from the DKRZ CMIP6 model data archive
How to count the annual number of summer days for a particular geolocation using this model dataset
How to visualize the results
You will use:
Intake for finding the data in the catalog of the DKRZ archive
Xarray for loading and processing the data
hvPlot for visualizing the data in the Jupyter notebook and save the plots in your local computer
0. Load Packages#
import numpy as np # fundamental package for scientific computing
import pandas as pd # data analysis and manipulation tool
import xarray as xr # handling labelled multi-dimensional arrays
import intake # to find data in a catalog, this notebook explains how it works
from ipywidgets import widgets # to use widgets in the Jupyer Notebook
from geopy.geocoders import Nominatim # Python client for several popular geocoding web services
import folium # visualization tool for maps
import hvplot.pandas # visualization tool for interactive plots
2. Intake Catalog#
2.1 Load the Intake Catalog#
# Path to master catalog on the DKRZ server
#dkrz_catalog=intake.open_catalog(["https://dkrz.de/s/intake"])
#
#only for the web page we need to take the original link:
parent_col=intake.open_catalog(["https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"])
list(parent_col)
# Open the catalog with the intake package and name it "col" as short for "collection"
col=parent_col["dkrz_cmip6_disk"]
2.2 Browse the Intake Catalog#
In this example we chose the Max-Planck Earth System Model in High Resolution Mode (“MPI-ESM1-2-HR”) and the maximum temperature near surface (“tasmax”) as variable. We also choose an experiment. CMIP6 comprises several kind of experiments. Each experiment has various simulation members. you can find more information in the CMIP6 Model and Experiment Documentation.
# Store the name of the model we chose in a variable named "climate_model"
climate_model = "MPI-ESM1-2-HR" # here we choose Max-Plack Institute's Earth Sytem Model in high resolution
# This is how we tell intake what data we want
query = dict(
source_id = climate_model, # the model
variable_id = "tasmax", # temperature at surface, maximum
table_id = "day", # daily maximum
experiment_id = eb, # what we selected in the drop down menu,e.g. SSP2.4-5 2015-2100
member_id = "r10i1p1f1", # "r" realization, "i" initialization, "p" physics, "f" forcing
)
# Intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ
col.df["uri"]=col.df["uri"].str.replace("lustre/","lustre02/")
cat = col.search(**query)
del col
# Show query results
cat.df
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | dcpp_init_year | version | time_range | project | format | uri | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20150101-20191231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
1 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20200101-20241231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
2 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20250101-20291231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
3 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20300101-20341231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
4 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20350101-20391231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
5 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20400101-20441231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
6 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20450101-20491231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
7 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20500101-20541231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
8 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20550101-20591231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
9 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20600101-20641231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
10 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20650101-20691231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
11 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20700101-20741231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
12 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20750101-20791231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
13 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20800101-20841231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
14 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20850101-20891231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
15 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20900101-20941231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
16 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 20950101-20991231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
17 | ScenarioMIP | DKRZ | MPI-ESM1-2-HR | ssp370 | r10i1p1f1 | day | tasmax | gn | NaN | v20190710 | 21000101-21001231 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ... |
2.3 Create Dictionary and get Data into one Object#
# cdf_kwargs are given to xarray.open_dataset
# cftime is like datetime but extends to all four digit years and many calendar types
dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}, "use_cftime": True})
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.source_id.experiment_id.table_id.grid_label'
/tmp/ipykernel_883/1061271266.py:4: DeprecationWarning: cdf_kwargs and zarr_kwargs are deprecated and will be removed in a future version. Please use xarray_open_kwargs instead.
dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}, "use_cftime": True})
# get all data into one object
for key, value in dset_dict.items():
model = key.split(".")[2] # extract model name from key
tasmax_xr = value["tasmax"].squeeze() # extract variable from dataset
tasmax_xr
<xarray.DataArray 'tasmax' (time: 31411, lat: 192, lon: 384)> dask.array<getitem, shape=(31411, 192, 384), dtype=float32, chunksize=(1827, 192, 384), chunktype=numpy.ndarray> Coordinates: * time (time) object 2015-01-01 12:00:00 ... 2100-12-31 12:00:00 * lat (lat) float64 -89.28 -88.36 -87.42 ... 87.42 88.36 89.28 * lon (lon) float64 0.0 0.9375 1.875 2.812 ... 357.2 358.1 359.1 height float64 2.0 member_id <U9 'r10i1p1f1' dcpp_init_year float64 nan Attributes: standard_name: air_temperature long_name: Daily Maximum Near-Surface Air Temperature comment: maximum near-surface (usually, 2 meter) air temperature (... units: K cell_methods: area: mean time: maximum cell_measures: area: areacella
3. Select Year and Look at (Meta) Data#
tasmax_year_xr = tasmax_xr.sel(time=str(yb))
# Let's have a look at the xarray data array
tasmax_year_xr
<xarray.DataArray 'tasmax' (time: 365, lat: 192, lon: 384)> dask.array<getitem, shape=(365, 192, 384), dtype=float32, chunksize=(365, 192, 384), chunktype=numpy.ndarray> Coordinates: * time (time) object 2021-01-01 12:00:00 ... 2021-12-31 12:00:00 * lat (lat) float64 -89.28 -88.36 -87.42 ... 87.42 88.36 89.28 * lon (lon) float64 0.0 0.9375 1.875 2.812 ... 357.2 358.1 359.1 height float64 2.0 member_id <U9 'r10i1p1f1' dcpp_init_year float64 nan Attributes: standard_name: air_temperature long_name: Daily Maximum Near-Surface Air Temperature comment: maximum near-surface (usually, 2 meter) air temperature (... units: K cell_methods: area: mean time: maximum cell_measures: area: areacella
We see not only the numbers, but also information about it, such as long name, units, and the data history. This information is called metadata.
4. Compare Model Grid Cell with chosen Location#
# Find nearest model coordinate by finding the index of the nearest grid point
abslat = np.abs(tasmax_year_xr["lat"] - location.latitude)
abslon = np.abs(tasmax_year_xr["lon"] - location.longitude)
c = np.maximum(abslon, abslat)
([xloc], [yloc]) = np.where(c == np.min(c)) # xloc and yloc are the indices of the neares model grid point
# Draw map again
m = folium.Map(location=[location.latitude, location.longitude], zoom_start=8)
tooltip = location.latitude, location.longitude
folium.Marker(
[location.latitude, location.longitude],
tooltip=tooltip,
popup="Location selected by You",
).add_to(m)
#
tooltip = float(tasmax_year_xr["lat"][yloc].values), float(tasmax_year_xr["lon"][xloc].values)
folium.Marker(
[tasmax_year_xr["lat"][yloc], tasmax_year_xr["lon"][xloc]],
tooltip=tooltip,
popup="Model Grid Cell Center",
).add_to(m)
# Define coordinates of model grid cell (just for visualization)
rect_lat1_model = (tasmax_year_xr["lat"][yloc - 1] + tasmax_year_xr["lat"][yloc]) / 2
rect_lon1_model = (tasmax_year_xr["lon"][xloc - 1] + tasmax_year_xr["lon"][xloc]) / 2
rect_lat2_model = (tasmax_year_xr["lat"][yloc + 1] + tasmax_year_xr["lat"][yloc]) / 2
rect_lon2_model = (tasmax_year_xr["lon"][xloc + 1] + tasmax_year_xr["lon"][xloc]) / 2
# Draw model grid cell
folium.Rectangle(
bounds=[[rect_lat1_model, rect_lon1_model], [rect_lat2_model, rect_lon2_model]],
color="#ff7800",
fill=True,
fill_color="#ffff00",
fill_opacity=0.2,
).add_to(m)
m
Climate models have a finite resolution. Hence, models do not provide the data of a particular point, but the mean over a model grid cell. Take this in mind when comparing model data with observed data (e.g. weather stations).
Now, we will visualize the daily maximum temperature time series of the model grid cell.
5. Draw Temperature Time Series and Count Summer days#
The definition of a summer day varies from region to region. According to the German Weather Service, “a summer day is a day on which the maximum air temperature is at least 25.0 °C”. Depending on the place you selected, you might want to apply a different threshold to calculate the summer days index.
tasmax_year_place_xr = tasmax_year_xr[:, yloc, xloc] - 273.15 # Convert Kelvin to °C
tasmax_year_place_df = pd.DataFrame(index = tasmax_year_place_xr['time'].values,
columns = ['Temperature', 'Summer Day Threshold']) # create the dataframe
tasmax_year_place_df.loc[:, 'Model Temperature'] = tasmax_year_place_xr.values # insert model data into the dataframe
tasmax_year_place_df.loc[:, 'Summer Day Threshold'] = 25 # insert the threshold into the dataframe
# Plot data and define title and legend
tasmax_year_place_df.hvplot.line(y=['Model Temperature', 'Summer Day Threshold'],
value_label='Temperature in °C', legend='bottom',
title='Daily maximum Temperature near Surface for '+pb,
height=500, width=620)
As we can see, the maximum daily temperature is highly variable over the year. As we are using the mean temperature in a model grid cell, the amount of summer days might we different that what you would expect at a single location.
# Summer days index calculation
no_summer_days_model = tasmax_year_place_xr[tasmax_year_place_xr.load() > 25].size # count the number of summer days
# Print results in a sentence
print("According to the German Weather Service definition, in the " +eb +" experiment the "
+climate_model +" model shows " +str(no_summer_days_model) +" summer days for " + pb
+ " in " + str(yb) +".")
According to the German Weather Service definition, in the ssp370 experiment the MPI-ESM1-2-HR model shows 2 summer days for Hamburg in 2021.
Try another location and year