Xarray for multidimensional gridded data
Contents
Xarray for multidimensional gridded data#
In the previous set of lectures, we saw how Pandas provided a way to keep track of additional “metadata” surrounding tabular datasets, including “indexes” for each row and labels for each column. These features, together with Pandas’ many useful routines for all kinds of data munging and analysis, have made Pandas one of the most popular python packages in the world.
However, not all Earth science datasets easily fit into the “tabular” model (i.e. rows and columns) imposed by Pandas. In particular, we often deal with multidimensional data. By multidimensional data (also often called N-dimensional), I mean data with many independent dimensions or axes. For example, we might represent Earth’s surface temperature \(T\) as a three dimensional variable
where \(x\) is longitude, \(y\) is latitude, and \(t\) is time.
The point of xarray is to provide pandas-level convenience for working with this type of data.
Learning Goals for Xarray#
Because of the importance of xarray for data analysis in geoscience, we are going to spend a long time on it. The goals of this section include the following.
Lesson 1: Xarray Fundamentals#
Dataset Creation#
Describe the core xarray data structures, the
DataArray
and theDataset
, and the components that make them up, including: Data Variables, Dimensions, Coordinates, Indexes, and AttributesCreate xarray
DataArrays
andDataSets
out of raw numpy arraysCreate xarray objects with and without indexes
Load xarray datasets from netCDF files and openDAP servers
View and set attributes
Basic Indexing and Interpolation#
Select data by position using
.isel
with values or slicesSelect data by label using
.sel
with values or slicesSelect timeseries data by date/time with values or slices
Use nearest-neighbor lookups with
.sel
Mask data with
.where
Interpolate data in one and several dimensions
Basic Computation#
Do basic arithmetic with DataArrays and Datasets
Use numpy universal function on DataArrays and Datasets, or use corresponding built-in xarray methods
Combine multiple xarray objects in arithmetic operations and understand how they are broadcasted / aligned
Perform aggregation (reduction) along one or multiple dimensions of a DataArray or Dataset
Basic Plotting#
Use built-in xarray plotting for 1D and 2D DataArrays
Customize plots with options
Lesson 2: Advanced Usage#
Xarray’s groupby, resample, and rolling#
Split xarray objects into groups using
groupby
Apply reduction operations to groups (e.g. mean)
Apply non-reducing functions to groups (e.g. standardize)
Use
groupby
with time coordinates (e.g. to create climatologies)Use artimetic between
GroupBy
objects and regular DataArrays / DatasetsUse
groupby_bins
to aggregate data in binsUse
resample
on time dimensionsUse
rolling
to apply rolling aggregations
Merging Combining Datasets#
Concatentate DataArrays and Datasets along a new or existing dimension
Merge multiple datasets with different variables
Add a new data variable to an existing Dataset
Reshaping Data#
Transpose dimension order
Swap coordinates
Expand and squeeze dimensions
Convert between DataArray and Dataset
Use
stack
andunstack
to transform data
Advanced Computations#
Use
differentiate
to take derivatives of dataUse
apply_ufunc
to apply custom or specialized operations to data
Plotting#
Show multiple line plots over a dimension using the
hue
keywordCreate multiple 2D plots using faceting