{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with output from many different CMIP6 climate models\n", "\n", "Climate modeling has been a cornerstone of understanding our climate and the [consequences of continued emissions of greenhouse gases](https://www.gfdl.noaa.gov/awards/former-noaa-scientist-suki-manabe-shares-nobel-prize-in-physics-for-pioneering-climate-prediction/) for decades. Much of the recent efforts in the community have been focus on model intercomparison projects (MIPs), which invite submissions of many different modeling groups around the world to run their models (which are all set up slightly different) under centralized forcing scenarios. These results can then be analyzed and the spread between different models can give an idea about the certainty of these predictions. The recent [Coupled Model Intercomparison Project Phase 6](https://gmd.copernicus.org/articles/9/1937/2016/) (CMIP6) represents an international effort to represent the state-of-the-art knowledge about how the climate system might evolve in the future and informs much of the [Intergovernmental Panel on Climate Change Report]([Intergovernmental Panel on Climate Change Report](https://github.com/IPCC-WG1/Chapter-9).\n", "\n", "In this lecture we will learn how to quickly search and analyze CMIP6 data with Pangeo tools in the cloud, a process that using the 'download and analyze' workflow often becomes prohibitively slow and inefficient due to the sheer scale of the data.\n", "\n", "The basis for this workflow are the analysis-ready-cloud-optimized repositories of CMIP6 data, which are currently maintained by the pangeo community and publicly available on both [Google Cloud Storage](https://medium.com/pangeo/cmip6-in-the-cloud-five-ways-96b177abe396) and [Amazon S3](https://www.youtube.com/watch?v=C0UhiiGgbWA&t=3267s) as a collection of [zarr](https://zarr.readthedocs.io/en/stable/) stores.\n", "\n", "The cloud native approach enables scientific results to be fully reproducible, encouraging to build onto and collaborate on scientific results. " ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "from xmip.preprocessing import combined_preprocessing\n", "from xmip.utils import google_cmip_col\n", "from xmip.postprocessing import match_metrics\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first thing we have to do is to get an overview of all the data available. In this case we are using [intake-esm](https://intake-esm.readthedocs.io/en/stable/index.html) to load a collection of zarr stores on Google Cloud Storage, but there are [other options](https://pangeo-data.github.io/pangeo-cmip6-cloud/accessing_data.html) to access the data too.\n", "\n", "Lets create a collection object and look at it\n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):
\n", " | unique | \n", "
---|---|
activity_id | \n", "18 | \n", "
institution_id | \n", "36 | \n", "
source_id | \n", "88 | \n", "
experiment_id | \n", "170 | \n", "
member_id | \n", "657 | \n", "
table_id | \n", "37 | \n", "
variable_id | \n", "700 | \n", "
grid_label | \n", "10 | \n", "
zstore | \n", "514818 | \n", "
dcpp_init_year | \n", "60 | \n", "
version | \n", "736 | \n", "
derived_variable_id | \n", "0 | \n", "
pangeo-cmip6 catalog with 2 dataset(s) from 6 asset(s):
\n", " | unique | \n", "
---|---|
activity_id | \n", "1 | \n", "
institution_id | \n", "1 | \n", "
source_id | \n", "1 | \n", "
experiment_id | \n", "1 | \n", "
member_id | \n", "3 | \n", "
table_id | \n", "1 | \n", "
variable_id | \n", "1 | \n", "
grid_label | \n", "2 | \n", "
zstore | \n", "6 | \n", "
dcpp_init_year | \n", "0 | \n", "
version | \n", "2 | \n", "
derived_variable_id | \n", "0 | \n", "
<xarray.Dataset>\n", "Dimensions: (y: 332, x: 362, member_id: 3, dcpp_init_year: 1,\n", " time: 1980, vertex: 4, bnds: 2)\n", "Coordinates: (12/13)\n", " area (y, x) float32 dask.array<chunksize=(332, 362), meta=np.ndarray>\n", " lat_verticies (y, x, vertex) float32 dask.array<chunksize=(332, 362, 4), meta=np.ndarray>\n", " lon_verticies (y, x, vertex) float32 dask.array<chunksize=(332, 362, 4), meta=np.ndarray>\n", " lat (y, x) float32 dask.array<chunksize=(332, 362), meta=np.ndarray>\n", " lon (y, x) float32 dask.array<chunksize=(332, 362), meta=np.ndarray>\n", " * time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00\n", " ... ...\n", " * y (y) int64 0 1 2 3 4 5 6 7 ... 325 326 327 328 329 330 331\n", " * x (x) int64 0 1 2 3 4 5 6 7 ... 355 356 357 358 359 360 361\n", " lon_bounds (bnds, y, x) float32 dask.array<chunksize=(1, 332, 362), meta=np.ndarray>\n", " lat_bounds (bnds, y, x) float32 dask.array<chunksize=(1, 332, 362), meta=np.ndarray>\n", " * member_id (member_id) object 'r1i1p1f1' 'r2i1p1f1' 'r3i1p1f1'\n", " * dcpp_init_year (dcpp_init_year) float64 nan\n", "Dimensions without coordinates: vertex, bnds\n", "Data variables:\n", " tos (member_id, dcpp_init_year, time, y, x) float32 dask.array<chunksize=(1, 1, 251, 332, 362), meta=np.ndarray>\n", "Attributes: (12/52)\n", " CMIP6_CV_version: cv=6.2.3.5-2-g63b123e\n", " Conventions: CF-1.7 CMIP-6.2\n", " EXPID: historical\n", " NCO: "4.6.0"\n", " activity_id: CMIP\n", " branch_method: standard\n", " ... ...\n", " intake_esm_attrs:variable_id: tos\n", " intake_esm_attrs:grid_label: gn\n", " intake_esm_attrs:version: 20180803\n", " intake_esm_attrs:_data_format_: zarr\n", " variant_info: Restart from another point in piControl...\n", " intake_esm_dataset_key: CMIP.IPSL.IPSL-CM6A-LR.historical.Omon.gn
<xarray.Dataset>\n", "Dimensions: (member_id: 3, dcpp_init_year: 1, time: 1980, y: 576,\n", " x: 720, vertex: 4, bnds: 2)\n", "Coordinates:\n", " * x (x) float64 -299.8 -299.2 -298.8 ... 58.75 59.25 59.75\n", " * y (y) float64 -77.91 -77.72 -77.54 ... 89.47 89.68 89.89\n", " lat (y, x) float32 dask.array<chunksize=(576, 720), meta=np.ndarray>\n", " lat_verticies (y, x, vertex) float32 dask.array<chunksize=(576, 720, 4), meta=np.ndarray>\n", " lon (y, x) float32 dask.array<chunksize=(576, 720), meta=np.ndarray>\n", " lon_verticies (y, x, vertex) float32 dask.array<chunksize=(576, 720, 4), meta=np.ndarray>\n", " * time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00\n", " time_bounds (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>\n", " lon_bounds (bnds, y, x) float32 dask.array<chunksize=(1, 576, 720), meta=np.ndarray>\n", " lat_bounds (bnds, y, x) float32 dask.array<chunksize=(1, 576, 720), meta=np.ndarray>\n", " * member_id (member_id) object 'r1i1p1f1' 'r2i1p1f1' 'r3i1p1f1'\n", " * dcpp_init_year (dcpp_init_year) float64 nan\n", "Dimensions without coordinates: vertex, bnds\n", "Data variables:\n", " tos (member_id, dcpp_init_year, time, y, x) float32 dask.array<chunksize=(1, 1, 64, 576, 720), meta=np.ndarray>\n", "Attributes: (12/48)\n", " Conventions: CF-1.7 CMIP-6.0 UGRID-1.0\n", " activity_id: CMIP\n", " branch_method: standard\n", " branch_time_in_child: 0.0\n", " comment: <null ref>\n", " contact: gfdl.climate.model.info@noaa.gov\n", " ... ...\n", " intake_esm_attrs:experiment_id: historical\n", " intake_esm_attrs:table_id: Omon\n", " intake_esm_attrs:variable_id: tos\n", " intake_esm_attrs:grid_label: gn\n", " intake_esm_attrs:_data_format_: zarr\n", " intake_esm_dataset_key: CMIP.NOAA-GFDL.GFDL-ESM4.historical.Omo...
<xarray.Dataset>\n", "Dimensions: (x: 720, y: 576, time: 1980, member_id: 1,\n", " dcpp_init_year: 1, vertex: 4, bnds: 2)\n", "Coordinates: (12/13)\n", " * x (x) float64 -299.8 -299.2 -298.8 ... 58.75 59.25 59.75\n", " * y (y) float64 -77.91 -77.72 -77.54 ... 89.47 89.68 89.89\n", " * time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00\n", " * member_id (member_id) object 'r1i1p1f1'\n", " * dcpp_init_year (dcpp_init_year) float64 nan\n", " lat (y, x) float32 dask.array<chunksize=(576, 720), meta=np.ndarray>\n", " ... ...\n", " lon (y, x) float32 dask.array<chunksize=(576, 720), meta=np.ndarray>\n", " lon_verticies (y, x, vertex) float32 dask.array<chunksize=(576, 720, 4), meta=np.ndarray>\n", " time_bounds (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>\n", " lon_bounds (bnds, y, x) float32 dask.array<chunksize=(1, 576, 720), meta=np.ndarray>\n", " lat_bounds (bnds, y, x) float32 dask.array<chunksize=(1, 576, 720), meta=np.ndarray>\n", " areacello (member_id, dcpp_init_year, y, x) float32 dask.array<chunksize=(1, 1, 576, 720), meta=np.ndarray>\n", "Dimensions without coordinates: vertex, bnds\n", "Data variables:\n", " tos (member_id, dcpp_init_year, time, y, x) float32 dask.array<chunksize=(1, 1, 64, 576, 720), meta=np.ndarray>\n", "Attributes: (12/48)\n", " Conventions: CF-1.7 CMIP-6.0 UGRID-1.0\n", " activity_id: CMIP\n", " branch_method: standard\n", " branch_time_in_child: 0.0\n", " comment: <null ref>\n", " contact: gfdl.climate.model.info@noaa.gov\n", " ... ...\n", " intake_esm_attrs:experiment_id: historical\n", " intake_esm_attrs:table_id: Omon\n", " intake_esm_attrs:variable_id: tos\n", " intake_esm_attrs:grid_label: gn\n", " intake_esm_attrs:_data_format_: zarr\n", " intake_esm_dataset_key: CMIP.NOAA-GFDL.GFDL-ESM4.historical.Omo...