Advanced Interactive time series plot of a global yearly mean anomaly with xarray, pandas and hvplot using CMIP6 data#
We will show how to combine, analyse and quickly plot data of the Coupled Model Intercomparison Project CMIP6. We will choose one variable of multiple experiments and compare the results of different models. In particular, we analyse the historical experiment in combination with one of the shared socioeconomic pathway (ssp) experiments.
This Jupyter notebook is meant to run in the Jupyterhub server of the German Climate Computing Center DKRZ. The DKRZ hosts the CMIP data pool including 4 petabytes of CMIP6 data. Please, choose the Python 3 unstable kernel on the Kernel tab above, it contains all the common geoscience packages. See more information on how to run Jupyter notebooks at DKRZ here.
Running this Jupyter notebook in your premise, which is also known as client-side computing, will require that you install the necessary packages and download data.
Learning Objectives#
How to access a dataset from the DKRZ CMIP data pool with
intake-esm
How to calculate global field means and yearly means with
xarray
andnumpy
How to visualize the results with
hvplot
import intake
import pandas as pd
import hvplot.pandas
import numpy as np
First, we need to set the variable_id
which we like to plot. This is a selection of the most often analysed variables:
tas
is Near-surface Air Temperaturepr
is Precipitationpsl
is Sea level pressuretasmax
is Near-surface Maximum Air Temperaturetasmin
is Near-surface Minimum Air Temperatureclt
is Total Cloud Cover Percentage
Choose the variable:
# Choose one of
# pr, psl, tas, tasmax, tasmin, clt
variable_id = "tas"
%store -r
# get formating done automatically according to style `black`
#%load_ext lab_black
The intake-esm
software reads Catalogs which we use to find, access and load the data we are interested in. Daily updated CMIP6 catalogs are provided in DKRZ’s cloud swift.
Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each dataset that you can access before loading the data. It means that thanks to the catalog, you can find out where the “book” is just by using some keywords and you do not need to hold it in your hand to know the number of pages.
We specify the catalog descriptor for the intake package. The catalog descriptor is created by the DKRZ developers that manage the catalog, you do not need to care so much about it, knowing where it is and loading it is enough:
# Path to master catalog on the DKRZ server
#dkrz_catalog=intake.open_catalog(["https://dkrz.de/s/intake"])
#
#only for the web page we need to take the original link:
parent_col=intake.open_catalog(["https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"])
list(parent_col)
['dkrz_cmip5_archive',
'dkrz_cmip5_disk',
'dkrz_cmip6_cloud',
'dkrz_cmip6_disk',
'dkrz_cordex_disk',
'dkrz_dyamond-winter_disk',
'dkrz_era5_disk',
'dkrz_monsoon_disk',
'dkrz_mpige_disk',
'dkrz_nextgems_disk',
'dkrz_palmod2_disk']
col=parent_col["dkrz_cmip6_disk"]
Browsing through the catalog#
We define a query and specify keyvalues for search facets in order to search the catalogs. Possible Search facets are all columns of the table identified by its name.
In this example, we compare the MPI-ESM1-2-HR model of the Max-Planck-Institute and the AWI-CM-1-1-MR from the Alfred Wegner Institute for 3 different experiments. CMIP6 comprises many experiments with lots of simulation members and we will use some of them. You can find more information in the CMIP6 Model and Experiment Documentation.
We will concatenate historical experiment with two different Shared Socioeconomic Pathway (SSPs) scenarios. The historical experiment uses best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014. SSPs are scenarios of projected socioeconomic global changes.
historical
This experiments usese the best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014.
ssp245
The 45 corresponds to the growth in radiative forcing reached by 2100, in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent
ssp585
The 85 corresponds to the growth in radiative forcing reached by 2100, in this case, 8.5 W/m2
query = dict(
variable_id=variable_id,
table_id="Amon",
experiment_id=["historical", "ssp585"], # we have excluded "ssp245" from the list because it would take 15min to finish the nb
source_id=["MPI-ESM1-2-HR", "AWI-CM-1-1-MR"],
)
col.df["uri"]=col.df["uri"].str.replace("lustre/","lustre02/")
cat = col.search(**query)
Let’s have a look into the new catalog subset cat
. We use the underlaying pandas
dataframe object df
to display the catalog as a table. Each row refers to one file.
cat.df
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | dcpp_init_year | version | time_range | project | format | uri | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP | AWI | AWI-CM-1-1-MR | historical | r1i1p1f1 | Amon | tas | gn | NaN | v20200720 | 185001-185012 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-... |
1 | CMIP | AWI | AWI-CM-1-1-MR | historical | r1i1p1f1 | Amon | tas | gn | NaN | v20200720 | 185101-185112 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-... |
2 | CMIP | AWI | AWI-CM-1-1-MR | historical | r1i1p1f1 | Amon | tas | gn | NaN | v20200720 | 185201-185212 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-... |
3 | CMIP | AWI | AWI-CM-1-1-MR | historical | r1i1p1f1 | Amon | tas | gn | NaN | v20200720 | 185301-185312 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-... |
4 | CMIP | AWI | AWI-CM-1-1-MR | historical | r1i1p1f1 | Amon | tas | gn | NaN | v20200720 | 185401-185412 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1272 | ScenarioMIP | DWD | MPI-ESM1-2-HR | ssp585 | r2i1p1f1 | Amon | tas | gn | NaN | v20190710 | 208001-208412 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/... |
1273 | ScenarioMIP | DWD | MPI-ESM1-2-HR | ssp585 | r2i1p1f1 | Amon | tas | gn | NaN | v20190710 | 208501-208912 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/... |
1274 | ScenarioMIP | DWD | MPI-ESM1-2-HR | ssp585 | r2i1p1f1 | Amon | tas | gn | NaN | v20190710 | 209001-209412 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/... |
1275 | ScenarioMIP | DWD | MPI-ESM1-2-HR | ssp585 | r2i1p1f1 | Amon | tas | gn | NaN | v20190710 | 209501-209912 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/... |
1276 | ScenarioMIP | DWD | MPI-ESM1-2-HR | ssp585 | r2i1p1f1 | Amon | tas | gn | NaN | v20190710 | 210001-210012 | CMIP6 | netcdf | /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/... |
1277 rows × 14 columns
Loading the data#
We can load the data into memory with only one code line. The catalog’s to_dataset_dict
command will aggregate and combine the data from files into comprehending xarray
datasets using the specifications from the intake descriptor file. The result is a dict
-type object where keys are the highest granularity which cannot be combined or aggregated anymore and values are the datasets.
xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":1}})
print(xr_dset_dict.keys())
/tmp/ipykernel_1312/4092023078.py:1: DeprecationWarning: cdf_kwargs and zarr_kwargs are deprecated and will be removed in a future version. Please use xarray_open_kwargs instead.
xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":1}})
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.source_id.experiment_id.table_id.grid_label'
dict_keys(['ScenarioMIP.AWI-CM-1-1-MR.ssp585.Amon.gn', 'ScenarioMIP.MPI-ESM1-2-HR.ssp585.Amon.gn', 'CMIP.MPI-ESM1-2-HR.historical.Amon.gn', 'CMIP.AWI-CM-1-1-MR.historical.Amon.gn'])
xr_dset_dict['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[9], line 1
----> 1 xr_dset_dict['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn']
KeyError: 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn'
Global yearly mean calculation#
We define a function for calculation the global mean by weighting grid boxes according to their surface area. Afterwards, we groupby years and calculate the yearly mean. This all is done by using xarray
.
def global_yearly_mean(hist_dsets):
# Get weights
weights = np.cos(np.deg2rad(hist_dsets.lat))
# Tas weighted
variable_array = hist_dsets.get(variable_id)
variable_weights = variable_array.weighted(weights)
# Tas global mean:
variable_globalmean = variable_weights.mean(("lon", "lat"))
# Tas yearly mean:
variable_gmym = variable_globalmean.groupby("time.year").mean("time")
return variable_gmym.values
Historical reference period#
We define the period from 1851-1880
as our reference period. In the following, we calculate future simulation anomalies from that period. But first things first, we need the global yearly mean for that period:
historical = [key for key in xr_dset_dict.keys() if "historical" in key][0]
dshist = xr_dset_dict[historical]
dshist_ref = dshist.sel(time=dshist.time.dt.year.isin(range(1851, 1881)))
# 10member
var_ref = global_yearly_mean(dshist_ref)
var_refmean = var_ref.mean()
Get Meta Data#
In order to label the plot correctly, we retrieve the attributes long_name
and units
from the chosen variable.
lname = dshist.get(variable_id).attrs["long_name"]
units = dshist.get(variable_id).attrs["units"]
label = "Delta " + lname + "[" + units + "]"
Calculate Anomaly#
We save the result - the anomaly values - in a
panda
s dataframevar_global_yearly_mean_anomaly
. We use this dataframe object because it features the plot functionhvplot
which we would like to use. We start by creating this dataframe based on the datasets which we got from intake.
lxr = list(xr_dset_dict.keys())
columns = [".".join(elem.split(".")[1:4]) for elem in lxr]
print(columns)
var_global_yearly_mean_anomaly = pd.DataFrame(index=range(1850, 2101), columns=columns)
['AWI-CM-1-1-MR.ssp585.Amon', 'MPI-ESM1-2-HR.ssp585.Amon', 'MPI-ESM1-2-HR.historical.Amon', 'AWI-CM-1-1-MR.historical.Amon']
For all datasets in our dictionary, we calculate the anomaly by substracting the the global mean of the reference period from the global yearly mean.
We add the results to the dataframe. Only years that are in the dataset can be filled into the dataframe.
for key in xr_dset_dict.keys():
print([".".join(key.split(".")[1:4])])
datatoappend = global_yearly_mean(xr_dset_dict[key])[0, :] - var_refmean
years = list(xr_dset_dict[key].get(variable_id).groupby("time.year").groups.keys())
var_global_yearly_mean_anomaly.loc[
years, ".".join(key.split(".")[1:4])
] = datatoappend
['AWI-CM-1-1-MR.ssp585.Amon']
['MPI-ESM1-2-HR.ssp585.Amon']
['MPI-ESM1-2-HR.historical.Amon']
['AWI-CM-1-1-MR.historical.Amon']
Plotting the multimodel comparison of the global annual mean anomaly#
plot = var_global_yearly_mean_anomaly.hvplot.line(
xlabel="Year",
ylabel=label,
value_label=label,
legend="top_left",
title="Global and yearly mean anomaly in comparison to 1851-1880",
grid=True,
height=600,
width=820,
)
#hvplot.save(plot, "globalmean-yearlymean-tas.html")
plot
Used data#
We acknowledge the CMIP community for providing the climate model data, retained and globally distributed in the framework of the ESGF. The CMIP data of this study were replicated and made available for this study by the DKRZ.”