aospy: automated climate data analysis and management

aospy (pronounced A - O - S - py) is an open source Python package for automating your computations that use gridded climate and weather data (namely data stored as netCDF files) and the management of the results of those computations.

aospy enables firing off multiple calculations in parallel using the permutation of an arbitrary number of climate models, simulations, variables to be computed, date ranges, sub-annual-sampling, and many other parameters. In other words, it is possible using aospy to submit and execute all calculations for a particular project (e.g. paper, class project, or thesis chapter) with a single command!

The results get saved in a highly organized directory tree as netCDF files, making it easy to subsequently find and use the data (e.g. for plotting) and preventing “orphan” files with vague filenames and insufficient metadata to remember what they are and/or how they were computed.

The eventual goal is for aospy to become the community standard for gridded climate data analysis and, in so doing, accelerate progress in climate science and make the results of climate research more easily reproducible and shareable.

Documentation

What’s New

v0.3.1 (unreleased)

v0.3 (15 November 2018)

This release adds a number of new features, fixes many bugs, and improves our documentation. It includes several deprecations, other breaking changes, and modifications to aospy’s list of required dependencies. We are grateful for three new contributors this time around and are eager to grow the user- and developer-base further moving forward! Highlights (full changelog below these):

  • Support for Python 3.7
  • Var objects can now be recursively computed in terms of other ``Var``s
  • Thanks to xarray.CFTimeIndex, no longer require special logic/handling of out-of-range datetimes
  • Improved handling of region longitude bounds in Region via new New Longitude class
Breaking Changes
  • Drop support for Python 2.7 and 3.4, since our core upstream dependency xarray is also dropping these soon (PR255, PR280). By Spencer Hill.
  • aospy.Region no longer can be instantiated using lat_bounds and lon_bounds keywords. These have been replaced with the more explicit east_bound, west_bound, south_bound, and north_bound (PR266). By Spencer Hill.
  • Deprecate Constant class and constants.py module. Physical constants used internally by aospy are now stored in _constants.py (fixes GH50 via PR223). By Micah Kim.
  • Deprecate Units class, so now the units attribute of the Var class is a string. (fixes GH50 via PR222). By Micah Kim.
  • Deprecate CalcInterface class. Now, to instantiate a Calc object, pass it directly the parameters that previously would have been passed to CalcInterface (fixes GH249 via PR250). By Spencer Hill.
  • Deprecate utils.times.convert_scalar_to_indexable_coord, since as of xarray version 0.10.3 release, the functionality is no longer necessary (fixes GH268 via PR269. By Spencer Hill.
  • Deprecate func_input_dtype argument to Var (fixes GH281 via PR282). By Spencer Hill.
Documentation
  • Updates the documentation in the described calc_suite_specs argument to submit_mult_calcs in automate.py (fixes GH295 via PR310). By James Doss-Gollin.
  • Corrected link to documentation badge on repository main page (PR213). By DaCoEx.
Enhancements
  • Improve compatibility for data following IRIDL conventions or NOAA data formats. Specifically, several alternate names are defined in GRID_ATTRS, while there is no longer an assumption that BOUNDS_STR is a coordinate of time_weights (fixes GH293 and GH299 via PR309). By James Doss-Gollin.
  • aospy now uses Versioneer to manage its version strings. By Spencer Hill (PR311).
  • Add support for Python 3.7. (closes GH292 via PR306. By Spencer Hill.
  • Use an xarray.CFTimeIndex for dates from non-standard calendars and outside the Timestamp-valid range. This eliminates the need for the prior workaround, which shifted dates to within the range 1678 to 2262 prior to indexing (closes GH98 via PR273). By Spencer Clark.
  • Create utils.longitude module and Longitude class for representing and comparing longitudes. Used internally by aospy.Region to construct masks, but could also be useful for users outside the standard aospy workflow (PR266). By Spencer Hill.
  • Add support for Region methods mask_var, ts, av, and std for data that doesn’t conform to aospy naming conventions, making these methods now useful in more interactive contexts in addition to within the standard main script-based work flow (PR266). By Spencer Hill.
  • Raise an exception with an informative message if submit_mult_calcs (and thus the main script) generates zero calculations, which can happen if one of the parameters is accidentally set to an empty list (closes GH253 via PR254). By Spencer Hill.
  • Suppress warnings from xarray when loading data whose dates extend outside the range supported by the numpy.datetime64 datatype. aospy has its own logic to deal with these cases (closes GH221 via PR239). By Spencer Hill.
  • Add units and description from Var objects to output netcdf files (closes GH201 via PR232). By Micah Kim.
  • Remove potentially confusing attributes from example netcdf files. (closes GH214 via PR216). By Micah Kim.
  • Cleanup logic for Dataset drop on dimensions with and without coords. Use Dataset isel instead. (closes GH142 via PR241). By Micah Kim.
  • Expose data_vars and coords options to xr.open_mfdataset in DataLoaders. These options control how variables and coordinates are concatenated when loaded in from multiple files; by default aospy uses data_vars='minimal' and coords='minimal', but there could be use cases where other options are desired. See the xarray documentation for more information (closes GH236 via PR240). By Spencer Clark.
  • Allow for variables to be functions of other computed variables (closes GH3 via PR263). By Spencer Clark.
  • Add a grid_attrs argument to the Model constructor to allow the specification of custom alternative names for grid attributes like time, latitude, or longitude (closes GH182 via PR297). By Spencer Clark.
Bug Fixes
  • Use the new Longitude class to support any longitude numbering convention (e.g. -180 to 180, 0 to 360, or any other) for both defining Region objects and for input data to be masked. Fixes bug wherein a region could be silently partially clipped off when masking input data with longitudes of a different numbering convention. Fixes GH229 via PR266. By Spencer Hill.
  • Cast input DataArrays with datatype np.float32 to np.float64 as a workaround for incorrectly computed means on float32 arrays in bottleneck (see pydata/xarray#1346). If one would like to disable this behavior (i.e. restore the original behavior before this fix), one can set the upcast_float32 keyword argument in their DataLoaders to False. Fixes GH217 via PR218. By Spencer Clark.
  • Switch from using scipy to netcdf4 as the engine when writing to netCDF files to avoid bugs when using libnetcdf version 4.5.0 (PR235). By Spencer Hill.
  • CalcSuite (and thus submit_mult_calc) now skips calculations that involve time reductions of non-time-defined variables. Calc now raises a ValueError when instantiated with a non-time-defined variable but has one or more time-defined reductions. (closes GH202 via PR242). By Micah Kim.
Testing
  • Create Travis CI environment that tests against the xarray development branch. (closes GH224 via :pull: 226). By Micah Kim.
  • Use nbconvert and nbformat rather than runipy to test the tutorial Jupyter notebook, as runipy is deprecated (PR239). By Spencer Hill.
  • Add flake8 to Travis CI environment to check that new code adheres to pep8 style. Add verbose flag to pytest test suite. (closes GH234 via PR237). By Micah Kim.
Dependencies
  • aospy now requires a minimum version of distributed of 1.17.1 (fixes GH210 via PR211).
  • aospy now requires a minimum version of xarray of 0.10.6. See discussion in GH199, PR240, GH268, PR269, PR273, and PR275 for more details.

v0.2 (26 September 2017)

This release includes some new features plus several bugfixes. The bugfixes include some that previously made using aospy on pressure-interpolated data very problematic. We have also improved support for reading in data from the WRF and CAM atmospheric models.

As of this release, aospy has at least 2(!) confirmed regular users that aren’t the original aospy developers, bringing the worldwide total of users up to at least 4. The first user-generated Github Issues have now also been created. We’re a real thing!

Enhancements
  • Use dask.bag coupled with dask.distributed rather than multiprocess to parallelize computations (closes GH169 via PR172). This enables the optional use of an external distributed.Client to leverage computational resources across multiple nodes of a cluster. By Spencer Clark.
  • Improve support for WRF and NCAR CAM model data by adding the internal names they use for grid attributes to aospy’s lists of potential names to search for. By Spencer Hill.
  • Allow a user to specify a custom preprocessing function in all DataLoaders to prepare data for processing with aospy. This could be used, for example, to add a CF-compliant units attribute to the time coordinate if it is not present in a set of files. Addresses GH177 via PR180. By Spencer Clark.
  • Remove dask.async import in model.py; no longer needed, and also prevents warning message from dask regarding location of get_sync function (PR195). By Spencer Hill.
Dependencies
  • multiprocess is no longer required for submitting aospy calculations in parallel (see discussion in GH169 and pull request PR172).
  • aospy now requires an installation of dask with version greater than or equal to 0.14 (see discussion in pull request PR172).
Bug Fixes
  • Remove faulty logic for calculations with data coming from multiple runs. Eventually this feature will be properly implemented (fixes GH117 via PR178). By Spencer Hill.
  • Only run tests that require optional dependencies if those dependencies are actually installed (fixes GH167 via PR176). By Spencer Hill.
  • Remove obsolete operator.py module (fixes GH174 via PR175). By Spencer Clark.
  • Fix workaround for dates with years less than 1678 to support units attributes with a reference date years not equal to 0001 (fixes GH188 via PR189). By Spencer Clark.
  • Fix bug which would prevent users from analyzing a subset within the Timestamp-valid range from a dataset which included data from outside the Timestamp-valid range (fixed in PR189). By Spencer Clark.
  • Toggle the mask_and_scale option to True when reading in netCDF files to enable missing values encoded as floats to be converted to NaN’s (fixes GH190 via PR192). By Spencer Clark.
  • Force regional calculations to mask gridcell weights where the loaded datapoints were invalid instead of just masking points outside the desired region (fixes GH190 via PR192). By Spencer Clark.
  • Retain original input data’s mask during gridpoint-by-gridpoint temporal averages (fixes GH193 via PR196). By Spencer Hill.
  • Always write output to a tar file in serial to prevent empty header file errors (fixes GH75 via PR197). By Spencer Clark.
  • Allow aospy to use grid attributes that are only defined in Run objects. Previously if a grid attribute were defined only in a Run object and not also in the Run’s corresponding Model, an error would be raised (fixes GH187 via PR199). By Spencer Clark.
  • When input data for a calculation has a time bounds array, overwrite its time array with the average of the start and end times for each timestep. Prevents bug wherein time arrays equal to either the start or end bounds get mistakenly grouped into the wrong time interval, i.e. the wrong month or year (fixes :issue 185 via PR200). By Spencer Hill.

v0.1.2 (30 March 2017)

This release improves the process of submitting multiple calculations for automatic execution. The user interface, documentation, internal logic, and packaging all received upgrades and/or bugfixes.

We also now have a mailing list. Join it to follow and/or post your own usage questions, bug reports, suggestions, etc.

Enhancements
  • Include an example library of aospy objects that works out-of-the-box with the provided example main script (PR155). By Spencer Clark and Spencer Hill.
  • Improve Examples page of the documentation by using this new example object library (PR164). By Spencer Hill.
  • Improve readability/usability of the included example script aospy_main.py for submitting aospy calculations by moving all internal logic into new automate.py module (PR155). By Spencer Clark and Spencer Hill.
  • Enable user to specify whether or not to write output to .tar files (in addition to the standard output). Also document an error that occurs when writing output to .tar files for sufficiently old versions of tar (including the version that ships standard on MacOS), and print a warning when errors are caught during the ‘tar’ call (PR160). By Spencer Hill.
Bug fixes
  • Update packaging specifications such that the example main script and tutorial notebook actually ship with aospy as intended (fixes GH149 via PR161). By Spencer Hill.
  • Use the ‘scipy’ engine for the xarray.DataArray.to_netcdf call when writing aospy calculation outputs to disk to prevent a bug when trying to re-write to an existing netCDF file (fixes GH157 via PR160). By Spencer Hill.

v0.1.1 (2 March 2017)

This release includes fixes for a number of bugs mistakenly introduced in the refactoring of the variable loading step of calc.py (PR90), as well as support for xarray version 0.9.1.

Enhancements
  • Support for xarray version 0.9.1 and require it or a later xarray version. By Spencer Clark and Spencer Hill.
  • Better support for variable names relating to “bounds” dimension of input data files. “bnds”, “bounds”, and “nv” now all supported (PR140). By Spencer Hill.
  • When coercing dims of input data to aospy’s internal names, for scalars change only the name; for non-scalars change the name, force them to have a coord, and copy over their attrs (PR140). By Spencer Hill.
Bug fixes
  • Fix bug involving loading data that has dims that lack coords (which is possible as of xarray v0.9.0). By Spencer Hill.
  • Fix an instance where the name for pressure half levels was mistakenly replaced with the name for the pressure full levels (PR126). By Spencer Clark.
  • Prevent workaround for dates outside the pd.Timestamp valid range from being applied to dates within the pd.Timestamp valid range (PR128). By Spencer Clark.
  • Ensure that all DataArrays associated with aospy.Var objects have a time weights coordinate with CF-compliant time units. This allows them to be cast as the type np.timedelta64, and be safely converted to have units of days before taking time-weighted averages (PR128). By Spencer Clark.
  • Fix a bug where the time weights were not subset in time prior to taking a time weighted average; this caused computed seasonal averages to be too small. To prevent this from failing silently again, we now raise a ValueError if the time coordinate of the time weights is not identical to the time coordinate of the array associated with the aospy.Var (PR128). By Spencer Clark.
  • Enable calculations to be completed using data saved as a single time-slice on disk (fixes GH132 through PR135). By Spencer Clark.
  • Fix bug where workaround for dates outside the pd.Timestamp valid range caused a mismatch between the data loaded and the data requested (fixes GH138 through PR139). By Spencer Clark.

v0.1 (24 January 2017)

Overview: Why aospy?

Use cases

If you’ve ever found yourself saying or thinking anything along these lines, then aospy may be a great tool for you:

  • “What is this precip01.dat file in my home directory that I vaguely remember creating a few weeks ago? Better just delete it and re-do the calculation to be sure.”
  • “I really need to analyze variable X. But the models/observational products I’m using don’t provide it. They do provide variables Y and Z, from which I can compute quantity X.”
  • “I need to calculate quantity X from 4 different simulations repeated in 5 different models; computing monthly, seasonal, and annual averages and standard deviations; gridpoint-by-gridpoint and averaged over these 10 regions. That’s impractical – I’ll just do a small subset.”

Each of these common problems is easily solved by using aospy.

Design philosophy

aospy’s ability to automate calculations (while properly handling model- and simulation-level idiosyncrasies) relies on separating your code into three distinct categories.

  1. Code characterizing the data you want to work with: “Where is your data and what does it represent?”
  2. Code describing abstract physical quantities and geographical regions you eventually want to examine: “What things are you generally interested in calculating?”
  3. Code specifying the exact parameters of calculations you want to perform right now: “Ok, time to actually crunch some numbers. Exactly what all do you want to compute from your data, and how do you want to slice and dice it?”

How you’ll actually interact with aospy in order to achieve each of these three steps is described in the Using aospy section of this documentation, and explicit examples using included sample data are in the Examples section.

Open Science & Reproducible Research

This separation of your code into three categories promotes open science and reproducible research in multiple ways:

  • By separating code that describes where your data is from the code used to compute particular physical quantities, the latter can be written in a generic form that closely mimics the physical form of the particular expression. The resulting code is easier to read and debug and therefore to share with others.
  • By enabling automation of calculations across an arbitrary number of parameter combinations, aospy facilitates more rigorous analyses and/or analyses encompassing a larger span of input data than would otherwise be possible.
  • By outputting the results of calculations as netCDF files in a highly organized directory structure with lots of metadata embued within the file path, file name, and the netCDF file itself, aospy facilitates the sharing of results with others (including your future self that has forgotten the myriad details of how you have computed things right now).

Using aospy

This section provides a high-level summary of how to use aospy. See the Overview section of this documentation for more background information, or the Examples section for concrete examples.

Your aospy object library

The first step is writing the code that describes your data and the quantities you eventually want to compute using it. We refer to this code collectively as your “object library”.

Describing your data on disk

aospy needs to know where the data you want to use is located on disk and how it is organized across different projects, models, and model runs (i.e. simulations). This involves a hierarchy of three classes, aospy.Proj, aospy.Model, and aospy.Run.

  1. aospy.Proj: This represents a single project that involves analysis of data from one or more models and simulations.
  2. aospy.Model: This represents a single climate model, other numerical model, observational data source, etc.
  3. aospy.Run: This represents a single simulation, version of observational data, etc.

So each user’s object library will contain one or more aospy.Proj objects, each of which will have one or more child aospy.Model objects, which in turn will each have one or more child aospy.Run objects.

Note

Currently, the Proj-Model-Run hierarchy is rigid, in that each Run has a parent Model, and each Model has a parent Proj. Work is ongoing to relax this to a more generic parent-child framework.

Physical variables

The aospy.Var class is used to represent physical variables, e.g. precipitation rate or potential temperature. This includes both variables which are directly available in netCDF files (e.g. they were directly outputted by your model or gridded data product) as well as those fields that must be computed from other variables (e.g. they weren’t directly outputted but can be computed from other variables that were outputted).

Note

aospy.Var objects with the name 'p' or 'dp' are handled in a special manner. These variables are meant to represent pressure and pressure thicknesses, respectively. When encountering variables named this way, aospy will attempt to compute them in accordance with the available vertical coordinates depending on the dtype_in_vert specified in the main script. Currently supported dtype_in_vert values are:

  • 'sigma': for data output on hybrid pressure coordinates
  • 'pressure': for data interpolated to levels of constant pressure.
Geographical regions

The aospy.Region class is used to define geographical regions over which quantities can be averaged (in addition to gridpoint-by-gridpoint values). Like aospy.Var objects, they are more generic than the objects of the aospy.Proj - aospy.Model - aospy.Run hierarchy, in that they correspond to the generic physical quantities/regions rather than the data of a particular project, model, or simulation.

Object library structure

The officially supported way to submit calculations is the aospy.submit_mult_calcs() function. In order for this to work, your object library must follow one or the other of these structures:

  1. All aospy.Proj and aospy.Var objects are accessible as attributes of your library. This means that my_obj_lib.my_obj works, where my_obj_lib is your object library, and my_obj is the object in question.
  2. All aospy.Proj objects are stored in a container called projs, where projs is an attribute of your library (i.e. my_obj_lib.projs). And likewise for aospy.Var objects in a variables attribute.

Beyond that, you can structure your object library however you wish. In particular, it can be structured as a Python module (i.e. a single “.py” file) or as a package (i.e. multiple “.py” files linked together; see the official documentation on package structuring).

A single module works great for small projects and for initially trying out aospy (this is how the example object library, aospy.examples.example_obj_lib, is structured). But as your object library grows, it can become easier to manage as a package of multiple files. For an example of a large object library that is structured as a formal package, see here.

Accessing your library

If your current working directory is the one containing your library, you can import your library via import my_obj_lib (replacing my_obj_lib with whatever you’ve named yours) in order to pass it to aospy.submit_mult_calcs().

Once you start using aospy a lot, however, this requirement of being in the same directory becomes cumbersome. As a solution, you can add the directory containing your object library to the PYTHONPATH environment variable. E.g if you’re using the bash shell:

export PYTHONPATH=/path/to/your/object/library:${PYTHONPATH}

Of course, replace /path/to/your/object/library with the actual path to yours. This command places your object library at the front of the PYTHONPATH environment variable, which is essentially the first place where Python looks to find packages and modules to be imported. (For more, see Python’s official documentation on PYTHONPATH).

Note

It’s convenient to copy this command into your shell profile (e.g., for the bash shell on Linux or Mac, ~/.bash_profile) so that you don’t have to call it again in every new terminal session.

To test this is working, run python -c "import my_obj_lib" from a directory other than where the library is located (again replacing my_obj_lib with the name you’ve given to your library). If this runs without error, you should be good to go.

Executing calculations

As noted above, the officially supported way to submit calculations is the aospy.submit_mult_calcs() function.

We provide a template “main” script with aospy that uses this function. We recommend copying it to the location of your choice. In the copy, replace the example object library and associated objects with your own. (If you accidentally change the original, you can always get a fresh copy from Github).

Running the main script

Once the main script parameters are all modified as desired, execute the script from the command line as follows

/path/to/your/aospy_main.py

This should generate a text summary of the specified parameters and a prompt as to whether to proceed or not with the calculations. An affirmative response then triggers the calculations to execute.

Note

You may need to change the permissions on the file to make it executable. E.g. from a Mac or Linux: chmod u+x /path/to/your/aospy_main.py. Alternatively you can call python or IPython from the command line to run it: python /path/to/your/aospy_main.py or ipython /path/to/your/aospy_main.py.

Specifically, the parameters are permuted over all possible combinations. So, for example, if two model names and three variable names were listed and all other parameters had only one element, six calculations would be generated and executed. There is no limit to the number of permutations.

Note

You can also call the main script interactively within an IPython session via %run /path/to/your/main.py or, from the command line, run the script and then start an interactive IPython session via ipython -i /path/to/your/main.py.

Or you can call aospy.submit_mult_calcs() directly within an interactive session.

As the calculations are performed, logging information will be printed to the terminal displaying their progress.

Parallelized calculations

The calculations generated by the main script can be executed in parallel using dask.distributed. aospy will either automatically set up a dask.distributed.LocalCluster to perform the calculations, or one can optionally specify an external distributed.Client to delegate the work. Otherwise, or if the user sets parallelize=False in the calc_exec_options argument of aospy.submit_mult_calcs(), script, the calculations will be executed one-by-one.

Particularly on instititutional clusters with many cores, this parallelization yields an impressive speed-up when multiple calculations are generated.

Note

When calculations are performed in parallel, often the logging information from different calculations running simultameously end up interwoven with one another, leading to output that is confusing to follow. Work is ongoing to improve the logging output when the computations are parallelized.

Finding the output

aospy saves the results of all calculations as netCDF files and embeds metadata describing it within the netCDF files, in their filenames, and in the directory structure within which they are saved.

  • Directory structure: /path/to/aospy-rootdir/projname/modelname/runname/varname
  • File name : varname.intvl_out.dtype_out_time.'from_'intvl_in'_'dtype_in_time.model.run.date_range.nc

See the API Reference on aospy.Calc for explanation of each of these components of the path and file name.

Under the hood

aospy.submit_mult_calcs() creates a aospy.CalcSuite object that permutes over the provided lists of calculation specifications, encoding each permutation into a aospy.Calc object.

The aospy.Calc object, in turn:

  • loads the required netCDF data given its simulation, variable, and date range
  • (if necessary) further truncates the data in time (i.e. to the given subset of the annual cycle, and/or if the requested date range doesn’t exactly align with the time chunking of the input netCDF files)
  • (if the variable is a function of other variables) executes the function that computes the calculation using this loaded and truncated data
  • applies all specified temporal and regional time reductions
  • writes the results (plus additional metadata) to disk as netCDF files and appends it to its own data_out attribute

Note

Actually, when multiple regions and/or output time/regional reductions are specified, these all get passed to each aospy.Calc object rather than being permuted over. They are then looped over during the subsequent calculations. This is to prevent unnecessary re-loading and re-computing, because, for a given simulation/variable/etc., all regions and reduction methods use the same data.

Note

Unlike aospy.Proj, aospy.Model, aospy.Run, aospy.Var, and aospy.Region, these objects are not intended to be saved in .py files for continual re-use. Instead, they are generated as needed, perform their desired tasks, and then go away.

See the API reference documentation for further details.

Examples

Note

The footnotes in this section provide scientific background to help you understand the motivation and physical meaning of these example calculations. They can be skipped if you are familiar already or aren’t interested in those details.

In this section, we use the example data files included with aospy to demonstrate the standard aospy workflow of executing and submitting multiple calculations at once.

These files contain timeseries of monthly averages of two variables generated by an idealized [1] aquaplanet [2] climate model:

  1. Precipitation generated through gridbox-scale condensation
  2. Precipitation generated through the model’s convective parameterization [3]

Using this data that was directly outputted by the model, let’s compute two other useful quantities: (1) the total precipitation rate, and (2) the fraction of the total precipitation rate that comes from the convective parameterization. We’ll compute the time-average over the whole duration of the data, both at each gridpoint and aggregated over specific regions.

Preliminaries

First we’ll save the path to the example data in a local variable, since we’ll be using it in several places below.

In [1]: import os  # Python built-in package for working with the operating system

In [2]: import aospy

In [3]: rootdir = os.path.join(aospy.__path__[0], 'test', 'data', 'netcdf')

Now we’ll use the fantastic xarray package to inspect the data:

In [4]: import xarray as xr

In [5]: xr.open_mfdataset(os.path.join(rootdir, '000[4-6]0101.precip_monthly.nc'),
   ...:                   decode_times=False)
   ...: 
Out[5]: 
<xarray.Dataset>
Dimensions:            (custom_lon: 128, lat: 64, latb: 65, lonb: 129, nv: 2, time: 36)
Coordinates:
  * lonb               (lonb) float64 -1.406 1.406 4.219 ... 353.0 355.8 358.6
  * custom_lon         (custom_lon) float64 0.0 2.812 5.625 ... 354.4 357.2
  * latb               (latb) float64 -90.0 -86.58 -83.76 ... 83.76 86.58 90.0
  * lat                (lat) float64 -87.86 -85.1 -82.31 ... 82.31 85.1 87.86
  * nv                 (nv) float64 1.0 2.0
  * time               (time) float64 1.111e+03 1.139e+03 ... 2.175e+03
Data variables:
    condensation_rain  (time, lat, custom_lon) float32 dask.array<shape=(36, 64, 128), chunksize=(12, 64, 128)>
    convection_rain    (time, lat, custom_lon) float32 dask.array<shape=(36, 64, 128), chunksize=(12, 64, 128)>
    time_bounds        (time, nv) float64 dask.array<shape=(36, 2), chunksize=(12, 2)>
    average_DT         (time) float64 dask.array<shape=(36,), chunksize=(12,)>
    zsurf              (time, lat, custom_lon) float32 dask.array<shape=(36, 64, 128), chunksize=(12, 64, 128)>

We see that, in this particular model, the variable names for these two forms of precipitation are “condensation_rain” and “convection_rain”, respectively. The file also includes the coordinate arrays (“lat”, “time”, etc.) that indicate where in space and time the data refers to.

Now that we know where and what the data is, we’ll proceed through the workflow described in the Using aospy section of this documentation.

Describing your data

Runs and DataLoaders

First we create an aospy.Run object that stores metadata about this simulation. This includes specifying where its files are located via an aospy.data_loader.DataLoader object.

DataLoaders specify where your data is located and organized. Several types of DataLoaders exist, each for a different directory and file structure; see the API Reference for details.

For our simple case, where the data comprises a single file, the simplest DataLoader, a DictDataLoader, works well. It maps your data based on the time frequency of its output (e.g. 6 hourly, daily, monthly, annual) to the corresponding netCDF files via a simple dictionary:

In [6]: from aospy.data_loader import DictDataLoader

In [7]: file_map = {'monthly': os.path.join(rootdir, '000[4-6]0101.precip_monthly.nc')}

In [8]: data_loader = DictDataLoader(file_map)

We then pass this to the aospy.Run constructor, along with a name for the run and an optional description.

In [9]: from aospy import Run

In [10]: example_run = Run(
   ....:     name='example_run',
   ....:     description='Control simulation of the idealized moist model',
   ....:     data_loader=data_loader,
   ....:     default_start_date='0004',
   ....:     default_end_date='0006'
   ....: )
   ....: 

Note

Throughout aospy, date slice bounds can be specified with dates of any of the following types:

  • str, for partial-datetime string indexing
  • np.datetime64
  • datetime.datetime
  • cftime.datetime

If possible, aospy will automatically convert the latter three to the appropriate date type used for indexing the data read in; otherwise it will raise an error. Therefore the arguments default_start_date and default_end_date in the Run constructor are calendar-agnostic (as are the date_ranges specified when submitting calculations).

Note

See the API reference for other optional arguments for this and the other core aospy objects used in this tutorial.

Note

An important consideration can be the datatype used to store values in your datasets. In particular, if the float32 datatype is used in storage, it can lead to undesired inaccuracies in the computation of reduction operations (like means) due to upstream issues (see pydata/xarray#1346 for more information). To address this it is recommended to always upcast float32 data to float64. This behavior is turned on by default. If you would like to disable this behavior you can set the upcast_float32 argument in your DataLoader constructors to False.

Models

Next, we create the aospy.Model object that describes the model in which the simulation was executed. One important attribute is grid_file_paths, which consists of a sequence (e.g. a tuple or list) of paths to netCDF files from which physical attributes of that model can be found that aren’t already embedded in the output netCDF files.

For example, often the land mask that defines which gridpoints are ocean or land is outputted to a single, standalone netCDF file, rather than being included in the other output files. But we often need the land mask, e.g. to define certain land-only or ocean-only regions. [4] This and other grid-related properties shared across all of a Model’s simulations can be found in one or more of the files in grid_file_paths.

The other important attribute is runs, which is a list of the aospy.Run objects that pertain to simulations performed in this particular model.

Lastly, grid attibutes (like time, latitude, longitude, etc.), often have different names from model to model. aospy can handle most typical grid attribute names automatically; however, if your model uses unusual grid attribute names you can add a grid_attrs dictionary to your Model object to define a mapping between aospy internal names and names for the grid attributes that can be found in your data files. In our example data, the land mask and longitude variables both have non-standard names: ‘custom_land_mask’ and ‘custom_lon’, respectively. To account for this we can provide a grid_attrs dictionary to the Model constructor, mapping the aospy internal names for these grid attributes to the names used for them in the model.

In [11]: from aospy import Model

In [12]: from aospy.internal_names import LAND_MASK_STR, LON_STR

In [13]: example_model = Model(
   ....:     name='example_model',
   ....:     grid_file_paths=(
   ....:         os.path.join(rootdir, '00040101.precip_monthly.nc'),
   ....:         os.path.join(rootdir, 'im.landmask.nc')
   ....:     ),
   ....:     runs=[example_run],  # only one Run in our case, but could be more
   ....:     grid_attrs={LAND_MASK_STR: 'custom_land_mask', LON_STR: 'custom_lon'}
   ....: )
   ....: 
Built-in alternative names

For reference, here is a table showing the list of grid attributes used in aospy along with their associated built-in alternative names. We are always open to adding more alternative names to this list, as long as their association with particular grid attributes is sufficiently unambiguous. For potentially ambiguous alternative names, e.g. ‘T’ for TIME_STR (since ‘T’ is used for time in some datasets but for temperature in others), we recommend using the grid_attrs argument in the Model constructor as demonstrated above.

Internal name Built-in alternative names
LAT_STR ‘lat’, ‘latitude’, ‘LATITUDE’, ‘y’, ‘Y’, ‘yto’, ‘XLAT’
LAT_BOUNDS_STR ‘latb’, ‘lat_bnds’, ‘lat_bounds’
LON_STR ‘lon’, ‘longitude’, ‘LONGITUDE’, ‘x’, ‘X’, ‘xto’, ‘XLONG’
LON_BOUNDS_STR ‘lonb’, ‘lon_bnds’, ‘lon_bounds’
ZSURF_STR ‘zsurf’, ‘HGT’
SFC_AREA_STR ‘area’, ‘sfc_area’
LAND_MASK_STR ‘land_mask’, ‘LANDFRAC’, ‘XLAND’, ‘land’
PK_STR ‘pk’
BK_STR ‘bk’
PHALF_STR ‘phalf’
PFULL_STR ‘pfull’
PLEVEL_STR ‘level’, ‘lev’, ‘plev’, ‘P’
TIME_STR ‘time’, ‘XTIME’
TIME_WEIGHTS_STR ‘time_weights’, ‘average_DT’
BOUNDS_STR ‘bounds’, ‘bnds’, ‘nv’, ‘nbnd’, ‘nbnds’
Projects

Finally, we associate the Model object with an aospy.Proj object. This is the level at which we specify the directories to which aospy output gets written.

In [14]: from aospy import Proj

In [15]: example_proj = Proj(
   ....:     'example_proj',
   ....:     direc_out='example-output',  # default, netCDF output (always on)
   ....:     tar_direc_out='example-tar-output', # output to .tar files (optional)
   ....:     models=[example_model]  # only one Model in our case, but could be more
   ....: )
   ....: 

This extra aospy.Proj level of organization may seem like overkill for this simple example, but it really comes in handy once you start using aospy for more than one project.

Defining physical quantities and regions

Having now fully specified the particular data of interest, we now define the general physical quantities of interest and any geographic regions over which to aggregate results.

Physical variables

We’ll first define aospy.Var objects for the two variables that we saw are directly available as model output:

In [16]: from aospy import Var

In [17]: precip_largescale = Var(
   ....:     name='precip_largescale',  # name used by aospy
   ....:     alt_names=('condensation_rain',),  # its possible name(s) in your data
   ....:     def_time=True,  # whether or not it is defined in time
   ....:     description='Precipitation generated via grid-scale condensation',
   ....: )
   ....: 

In [18]: precip_convective = Var(
   ....:     name='precip_convective',
   ....:     alt_names=('convection_rain', 'prec_conv'),
   ....:     def_time=True,
   ....:     description='Precipitation generated by convective parameterization',
   ....: )
   ....: 

When it comes time to load data corresponding to either of these from one or more particular netCDF files, aospy will search for variables matching either name or any of the names in alt_names, stopping at the first successful one. This makes the common problem of model-specific variable names a breeze: if you end up with data with a new name for your variable, just add it to alt_names.

Warning

This assumes that the name and all alternate names are unique to that variable, i.e. that in none of your data do those names actually signify something else. If that was indeed the case, aospy can potentially grab the wrong data without issuing an error message or warning.

Next, we’ll create functions that compute the total precipitation and convective precipitation fraction and combine them with the above aospy.Var objects to define the new aospy.Var objects:

In [19]: def total_precip(condensation_rain, convection_rain):
   ....:     """Sum of large-scale and convective precipitation."""
   ....:     return condensation_rain + convection_rain
   ....: 

In [20]: def conv_precip_frac(precip_largescale, precip_convective):
   ....:     """Fraction of total precip that is from convection parameterization."""
   ....:     total = total_precip(precip_largescale, precip_convective)
   ....:     return precip_convective / total.where(total)
   ....: 

In [21]: precip_total = Var(
   ....:     name='precip_total',
   ....:     def_time=True,
   ....:     func=total_precip,
   ....:     variables=(precip_largescale, precip_convective),
   ....: )
   ....: 

In [22]: precip_conv_frac = Var(
   ....:    name='precip_conv_frac',
   ....:    def_time=True,
   ....:    func=conv_precip_frac,
   ....:    variables=(precip_largescale, precip_convective),
   ....: )
   ....: 

Notice the func and variables attributes that weren’t in the prior Var constuctors. These signify the function to use and the physical quantities to pass to that function in order to compute the quantity.

As of aospy version 0.3, Var objects are computed recursively; this means that as long as things eventually lead back to model-native quantities, you can express a computed variable (i.e. one with func and variables attributes) in terms of other computed variables. For example we could equivalently express the precip_conv_frac more simply as the following:

In this case, aospy will automatically know to load in precip_largescale and precip_convective in order to compute precip_total before passing it along to the function specified in precip_conv_frac. Any depth of recursion is supported.

Note

Although variables is passed a tuple of Var objects corresponding to the physical quantities passed to func, func should be a function whose arguments are the xarray.DataArray objects corresponding to those variables. aospy uses the Var objects to load the DataArrays and then passes them to the function.

This enables you to write simple, expressive functions comprising only the physical operations to perform (since the “data wrangling” part has been handled already).

Warning

Order matters in the tuple of aospy.Var objects passed to the variables attribute: it must match the order of the call signature of the function passed to func.

E.g. in precip_conv_frac above, if we had mistakenly done variables=(precip_convective, precip_largescale), the calculation would execute without error, but all of the results would be physically wrong.

Geographic regions

Last, we define the geographic regions over which to perform aggregations and add them to example_proj. We’ll look at the whole globe and at the Tropics:

In [23]: from aospy import Region

In [24]: globe = Region(
   ....:     name='globe',
   ....:     description='Entire globe',
   ....:     west_bound=0,
   ....:     east_bound=360,
   ....:     south_bound=-90,
   ....:     north_bound=90,
   ....:     do_land_mask=False
   ....: )
   ....: 

In [25]: tropics = Region(
   ....:     name='tropics',
   ....:     description='Global tropics, defined as 30S-30N',
   ....:     west_bound=0,
   ....:     east_bound=360,
   ....:     south_bound=-30,
   ....:     north_bound=30,
   ....:     do_land_mask=False
   ....: )
   ....: 

In [26]: example_proj.regions = [globe, tropics]

We now have all of the needed metadata in place. So let’s start crunching numbers!

Submitting calculations

Using aospy.submit_mult_calcs()

Having put in the legwork above of describing our data and the physical quantities we wish to compute, we can submit our desired calculations for execution using aospy.submit_mult_calcs(). Its sole required argument is a dictionary specifying all of the desired parameter combinations.

In the example below, we import and use the example_obj_lib module that is included with aospy and whose objects are essentially identical to the ones we’ve defined above.

In [27]: from aospy.examples import example_obj_lib as lib

In [28]: calc_suite_specs = dict(
   ....:     library=lib,
   ....:     projects=[lib.example_proj],
   ....:     models=[lib.example_model],
   ....:     runs=[lib.example_run],
   ....:     variables=[lib.precip_largescale, lib.precip_convective,
   ....:                lib.precip_total, lib.precip_conv_frac],
   ....:     regions='all',
   ....:     date_ranges='default',
   ....:     output_time_intervals=['ann'],
   ....:     output_time_regional_reductions=['av', 'reg.av'],
   ....:     output_vertical_reductions=[None],
   ....:     input_time_intervals=['monthly'],
   ....:     input_time_datatypes=['ts'],
   ....:     input_time_offsets=[None],
   ....:     input_vertical_datatypes=[False],
   ....: )
   ....: 

See the API Reference on aospy.submit_mult_calcs() for more on calc_suite_specs, including accepted values for each key.

submit_mult_calcs() also accepts a second dictionary specifying some options regarding how we want aospy to display, execute, and save our calculations. For the sake of this simple demonstration, we’ll suppress the prompt to confirm the calculations, submit them in serial rather than parallel, and suppress writing backup output to .tar files:

In [29]: calc_exec_options = dict(prompt_verify=False, parallelize=False,
   ....:                          write_to_tar=False)
   ....: 

Now let’s submit this for execution:

In [30]: from aospy import submit_mult_calcs

In [31]: calcs = submit_mult_calcs(calc_suite_specs, calc_exec_options)

This permutes over all of the parameter settings in calc_suite_specs, generating and executing the resulting calculation. In this case, it will compute all four variables and perform annual averages, both for each gridpoint and regionally averaged.

Although we do not show it here, this also prints logging information to the terminal at various steps during each calculation, including the filepaths to the netCDF files written to disk of the results.

Warning

For date ranges specified using tuples of datetime-like objects, aospy will check to make sure that datapoints exist for the full extent of the time ranges specified. For date ranges specified as tuples of strings, however, this check is currently not implemented. This is mostly harmless (i.e. it will not change the results of calculations); however, it can result in files whose labels do not accurately represent the actual time bounds of the calculation if you specify string date ranges that span more than the interval of the input data.

Results

The result is a list of aospy.Calc objects, one per simulation.

In [32]: calcs
Out[32]: 
[<aospy.Calc instance: precip_conv_frac, example_proj, example_model, example_run>,
 <aospy.Calc instance: precip_convective, example_proj, example_model, example_run>,
 <aospy.Calc instance: precip_largescale, example_proj, example_model, example_run>,
 <aospy.Calc instance: precip_total, example_proj, example_model, example_run>]

Each aospy.Calc object includes the paths to the output

In [33]: calcs[0].path_out
Out[33]: 
{'av': 'example-output/example_proj/example_model/example_run/precip_conv_frac/precip_conv_frac.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc',
 'reg.av': 'example-output/example_proj/example_model/example_run/precip_conv_frac/precip_conv_frac.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc'}

and the results of each output type

In [34]: calcs[0].data_out
Out[34]: 
{'av': <xarray.DataArray (lat: 64, lon: 128)>
 array([[5.825489e-11, 5.568656e-06, 3.746100e-05, ..., 0.000000e+00,
         1.773260e-11, 2.142449e-11],
        [1.718738e-03, 1.864318e-03, 1.876379e-03, ..., 1.584265e-03,
         1.650416e-03, 1.634984e-03],
        [4.336546e-03, 3.980294e-03, 3.332168e-03, ..., 5.023761e-03,
         4.934178e-03, 4.846593e-03],
        ...,
        [4.826577e-03, 5.116871e-03, 5.306013e-03, ..., 3.952671e-03,
         3.857220e-03, 4.159514e-03],
        [3.559006e-04, 4.071099e-04, 4.688400e-04, ..., 3.086923e-04,
         3.293162e-04, 3.424217e-04],
        [2.896832e-11, 4.326204e-11, 2.680294e-11, ..., 4.254056e-11,
         7.040103e-11, 9.109586e-11]])
 Coordinates:
   * lon                  (lon) float64 0.0 2.812 5.625 ... 351.6 354.4 357.2
   * lat                  (lat) float64 -87.86 -85.1 -82.31 ... 82.31 85.1 87.86
     zsurf                (lat, lon) float32 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
     raw_data_start_date  object 0004-01-01 00:00:00
     raw_data_end_date    object 0007-01-01 00:00:00
     subset_start_date    object 0004-01-01 00:00:00
     subset_end_date      object 0006-12-31 00:00:00
     sfc_area             (lat, lon) float64 3.553e+09 3.553e+09 ... 3.553e+09
     land_mask            (lat, lon) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
 Attributes:
     units:        
     description:  Fraction of total precip that is from convection parameteri...,
 'reg.av': <xarray.Dataset>
 Dimensions:              ()
 Coordinates:
     raw_data_start_date  object 0004-01-01 00:00:00
     raw_data_end_date    object 0007-01-01 00:00:00
     subset_start_date    object 0004-01-01 00:00:00
     subset_end_date      object 0006-12-31 00:00:00
 Data variables:
     tropics              float64 0.808
     globe                float64 0.598}

Note

Notice that the variable’s name and description have been copied to the resulting Dataset (and hence also to the netCDF file saved to disk). This enables you to better understand what the physical quantity is, even if you don’t have the original Var definition on hand.

Let’s plot (using matplotlib) the time average at each gridcell of all four variables. For demonstration purposes, we’ll load the data that was saved to disk using xarray rather than getting it directly from the data_out attribute as above.

In [35]: from matplotlib import pyplot as plt

In [36]: fig = plt.figure()

In [37]: for i, calc in enumerate(calcs):
   ....:     ax = fig.add_subplot(2, 2, i+1)
   ....:     arr = xr.open_dataset(calc.path_out['av']).to_array()
   ....:     if calc.name != precip_conv_frac.name:
   ....:         arr *= 86400  # convert to units mm per day
   ....:     arr.plot(ax=ax)
   ....:     ax.set_title(calc.name)
   ....:     ax.set_xticks(range(0, 361, 60))
   ....:     ax.set_yticks(range(-90, 91, 30))
   ....: 

In [38]: plt.tight_layout()

In [39]: plt.show()
_images/plot_av.png

We see that precipitation maximizes at the equator and has a secondary maximum in the mid-latitudes. [5] Also, the convective precipitation dominates the total in the Tropics, but moving poleward the gridscale condensation plays an increasingly larger fractional role (note different colorscales in each panel). [6]

Now let’s examine the regional averages. We find that the global annual mean total precipitation rate for this run (converting to units of mm per day) is:

In [40]: for calc in calcs:
   ....:     ds = xr.open_dataset(calc.path_out['reg.av'])
   ....:     if calc.name != precip_conv_frac.name:
   ....:         ds *= 86400  # convert to units mm/day
   ....:     print(calc.name, ds, '\n')
   ....: 
precip_conv_frac <xarray.Dataset>
Dimensions:              ()
Coordinates:
    raw_data_start_date  object ...
    raw_data_end_date    object ...
    subset_start_date    object ...
    subset_end_date      object ...
Data variables:
    tropics              float64 ...
    globe                float64 ... 

precip_convective <xarray.Dataset>
Dimensions:              ()
Coordinates:
    raw_data_end_date    object ...
    subset_start_date    object ...
    subset_end_date      object ...
    raw_data_start_date  object ...
Data variables:
    tropics              float64 3.055
    globe                float64 2.11 

precip_largescale <xarray.Dataset>
Dimensions:              ()
Coordinates:
    raw_data_end_date    object ...
    subset_start_date    object ...
    subset_end_date      object ...
    raw_data_start_date  object ...
Data variables:
    tropics              float64 0.4551
    globe                float64 0.9149 

precip_total <xarray.Dataset>
Dimensions:              ()
Coordinates:
    raw_data_end_date    object ...
    subset_start_date    object ...
    subset_end_date      object ...
    raw_data_start_date  object ...
Data variables:
    tropics              float64 3.51
    globe                float64 3.025 

As was evident from the plots, we see that most precipitation (80.8%) in the tropics comes from convective rainfall, but averaged over the globe the large-scale condensation is a more equal player (40.2% for large-scale, 59.8% for convective).

Beyond this simple example

Scaling up

In this case, we computed time averages of four variables, both at each gridpoint (which we’ll call 1 calculation) and averaged over two regions, yielding (4 variables)*(1 gridcell operation + (2 regions)*(1 regional operation)) = 12 total calculations executed. Not bad, but 12 calculations is few enough that we probably could have handled them without aospy.

The power of aospy is that, with the infrastructure we’ve put in place, we can now fire off additional calculations at any time. Some examples:

  • Set output_time_regional_reductions=['ts', 'std', 'reg.ts', 'reg.std'] : calculate the timeseries (‘ts’) and standard deviation (‘std’) of annual mean values at each gridpoint and for the regional averages.
  • Set output_time_intervals=range(1, 13) : average across years for each January (1), each February (2), etc. through December (12). [7]

With these settings, the number of calculations is now (4 variables)*(2 gridcell operations + (2 regions)*(2 regional operations))*(12 temporal averages) = 288 calculations submitted with a single command.

Modifying your object library

We can also add new objects to our object library at any time. For example, suppose we performed a new simulation in which we modified the formulation of the convective parameterization. All we would have to do is create a corresponding aospy.Run object, and then we can execute calculations for that simulation. And likewise for models, projects, variables, and regions.

As a real-world example, two of aospy’s developers use aospy for in their own scientific research, with multiple projects each comprising multiple models, simulations, etc. They routinely fire off thousands of calculations at once. And thanks to the highly organized and metadata-rich directory structure and filenames of the aospy output netCDF files, all of the resulting data is easy to find and use.

Example “main” script

Finally, aospy comes included with a “main” script for submitting calculations that is pre-populated with the objects from the example object library. It also comes with in-line instructions on how to use it, whether you want to keep playing with the example library or modify it to use on your own object library.

It is located in “examples” directory of your aospy installation. Find it via typing python -c "import os, aospy; print(os.path.join(aospy.__path__[0], 'examples', 'aospy_main.py'))" from your terminal.

Footnotes

[1]An “idealized climate model” is a model that, for the sake of computational efficiency and conceptual simplicity, omits and/or simplifies various processes relative to how they are computed in full, production-class models. The particular model used here is described in Frierson et al 2006.
[2]An “aquaplanet” is simply a climate model in which the the surface is entirely ocean, i.e. there is no land. Interactions between atmospheric and land processes are complicated, and so an aquaplanet avoids those complications while still generating a climate (when zonally averaged, i.e. averaged around each latitude circle) that roughly resembles that of the real Earth’s.
[3]Most climate models generate precipitation through two separate pathways: (1) direct saturation of a whole gridbox, which results in condensation and precipitation, and (2) a “convective parameterization.” The latter simulates the precipitation that, due to subgrid-scale variability, can be expected to occur at some fraction of the area within a gridcell, even though the cell as a whole isn’t saturated. The total precipitation is simply the sum of these “large-scale” and “convective” components.
[4]In this case, the model being used is an aquaplanet, so the mask will be simply all ocean. But this is not generally the case – comprehensive climate and weather models include Earth’s full continental geometry and land topography (at least as well as can be resolved at their particular horizontal grid resolution).
[5]This equatorial rainband is called the Intertropical Convergence Zone, or ITCZ. In this simulation, the imposed solar radiation is fixed at Earth’s annual mean value, which is symmetric about the equator. The ITCZ typically follows the solar radiation maximum, hence its position in this case directly on the equator.
[6]This is a very common result. The gridcells of many climate models are several hundred kilometers by several hundred kilometers in area. In Earth’s Tropics, most rainfall is generated by cumulus towers that are much smaller than this. But in the mid-latitudes, a phenomenon known as baroclinic instability generates much larger eddies that can span several hundred kilometers.
[7]In this particular simulation, the boundary conditions are constant in time, so there is no seasonal cycle. But we could use these monthly averages to confirm that’s actually the case, i.e. that we didn’t accidentally use time-varying solar radiation when we ran the model.

Installation

Supported platforms

  • Operating system: Windows, MacOS/Mac OS X, and Linux
  • Python: 3.5, 3.6, and 3.7

Note

We highly recommend installing and using the free Anaconda (or Miniconda) distribution of Python, which works on Mac, Linux, and Windows, both on normal computers and institutional clusters and doesn’t require root permissions.

See e.g. this blog post for instructions on how to install Miniconda on a large cluster without root permission.

Alternative method #1: pip

aospy is available from the official Python Packaging Index (PyPI) via pip:

pip install aospy

Alternative method #2: clone from Github

You can also directly clone the Github repo

git clone https://www.github.com/spencerahill/aospy.git
cd aospy
python setup.py install

Verifying proper installation

Once installed via any of these methods, you can run aospy’s suite of tests using py.test. From the top-level directory of the aospy installation

conda install pytest  # if you don't have it already; or 'pip install pytest'
py.test aospy

If you don’t know the directory where aospy was installed, you can find it via

python -c "import aospy; print(aospy.__path__[0])"

If the pytest command results in any error messages or test failures, something has gone wrong, and please refer to the Troubleshooting information below.

Troubleshooting

Please search through our Issues page on Github and our mailing list to see if anybody else has had the same problem you’re facing. If none do, then please send a message on the list or open a new Issue.

Please don’t hesitate, especially if you’re new to the package and/or Python! We are eager to help people get started. Even you suspect it’s something simple or obvious – that just means we didn’t make it clear/easy enough!

API Reference

Here we provide the reference documentation for aospy’s public API. If you are new to the package and/or just trying to get a feel for the overall workflow, you are better off starting in the Overview, Using aospy, or Examples sections of this documentation.

Warning

aospy is under active development. While we strive to maintain backwards compatibility, it is likely that some breaking changes to the codebase will occur in the future as aospy is improved.

Core Hierarchy for Input Data

aospy provides three classes for specifying the location and characteristics of data saved on disk as netCDF files that the user wishes to use as input data for aospy calculations: Proj, Model, and Run.

Proj
class aospy.proj.Proj(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]

An object that describes a single project that will use aospy.

This is the top-level class in the aospy hierarchy of data representations. It is meant to contain all of the Model, Run, and Region objects that are of relevance to a particular research project. (Any of these may be used by multiple Proj objects.)

The Proj class itself provides little functionality, but it is an important means of organizing a user’s work across different projects. In particular, the output of all calculations using aospy.Calc are saved in a directory structure whose root is that of the Proj object specified for that calculation.

Attributes:
name : str

The run’s name

description : str

A description of the run

direc_out, tar_direc_out : str

The paths to the root directories of, respectively, the standard and .tar versions of the output of aospy calculations saved to disk.

models : dict

A dictionary with entries of the form {model_obj.name: model_obj}, for each of this Proj’s child model objects

default_models : dict

The default model objects on which to perform calculations via aospy.Calc if not otherwise specified

regions : dict

A dictionary with entries of the form {regin_obj.name: region_obj}, for each of this Proj’s child region objects

__init__(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]
Parameters:
name : str

The project’s name. This should be unique from that of any other Proj objects being used.

description : str, optional

A description of the model. This is not used internally by aospy; it is solely for the user’s information.

regions : {None, sequence of aospy.Region objects}, optional

The desired regions over which to perform region-average calculations.

models : {None, sequence of aospy.Model objects}, optional

The child Model objects of this project.

default_models : {None, sequence of aospy.Run objects}, optional

The subset of this Model’s runs over which to perform calculations by default.

direc_out, tar_direc_out : str

Path to the root directories of where, respectively, regular output and a .tar-version of the output will be saved to disk.

See also

aospy.Model, aospy.Region, aospy.Run

Model
class aospy.model.Model(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False, grid_attrs=None)[source]

An object that describes a single climate or weather model.

Each Model object is associated with a parent Proj object and also with one or more child Run objects.

If aospy is being used to work with non climate- or weather-model data, the Model object can be used e.g. to represent a gridded observational product, with its child Run objects representing different released versions of that dataset.

Attributes:
name : str

The model’s name

description : str

A description of the model

proj : {None, aospy.Proj}

The model’s parent aospy.Proj object

runs : list

A list of this model’s child Run objects

default_runs : list

The default subset of child run objects on which to perform calculations via aospy.Calc with this model if not otherwise specified

grid_file_paths : list

The paths to netCDF files stored on disk from which the model’s coordinate data can be taken.

default_start_date, default_end_date : datetime.datetime

The default start and end dates of any calculations using this Model

__init__(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False, grid_attrs=None)[source]
Parameters:
name : str

The model’s name. This must be unique from that of any other Model objects being used by the parent Proj.

description : str, optional

A description of the model. This is not used internally by aospy; it is solely for the user’s information.

proj : {None, aospy.Proj}, optional

The parent Proj object. When the parent Proj object is instantiated with this Model included in its models attribute, this will be over-written with that Proj object.

grid_file_paths : {None, sequence of strings}, optional

The paths to netCDF files stored on disk from which the model’s coordinate data can be taken.

default_start_date : {None, datetime.datetime}, optional

Default start date of calculations to be performed using this Model.

default_end_date : {None, datetime.datetime}, optional

Default end date of calculations to be performed using this Model.

runs : {None, sequence of aospy.Run objects}, optional

The child run objects of this Model

default_runs : {None, sequence of aospy.Run objects}, optional

The subset of this Model’s runs over which to perform calculations by default.

load_grid_data : bool, optional (default False)

Whether or not to load the grid data specified by ‘grid_file_paths’ upon initilization

grid_attrs : dict, optional (default None)

Dictionary mapping aospy internal names of grid attributes to their corresponding names used in a particular model. E.g. {TIME_STR: 'T'}. While aospy checks for a number of alternative names for grid attributes used by various models, it is not possible to anticipate all possible names. This option allows the user to explicitly tell aospy which variables correspond to which internal names (internal names not provided in this dictionary will be attempted to be found in the usual way). For a list of built-in alternative names see the table here.

See also

aospy.DataLoader, aospy.Proj, aospy.Run

set_grid_data()[source]

Populate the attrs that hold grid data.

Run
class aospy.run.Run(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]

An object that describes a single model ‘run’ (i.e. simulation).

Each Run object is associated with a parent Model object. This parent attribute is not set by Run itself, however; it is set during the instantation of the parent Model object.

If aospy is being used to work with non climate-model data, the Run object can be used e.g. to represent different versions of a gridded observational data product, with the parent Model representing that data product more generally.

Attributes:
name : str

The run’s name

description : str

A description of the run

proj : {None, aospy.Proj}

The run’s parent aospy.Proj object

default_start_date, default_end_date : datetime.datetime

The default start and end dates of any calculations using this Run

data_loader : aospy.DataLoader

The aospy.DataLoader object used to find data on disk corresponding to this Run object

__init__(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]

Instantiate a Run object.

Parameters:
name : str

The run’s name. This must be unique from that of any other Run objects being used by the parent Model.

description : str, optional

A description of the model. This is not used internally by aospy; it is solely for the user’s information.

proj : {None, aospy.Proj}, optional

The parent Proj object.

data_loader : aospy.DataLoader

The DataLoader object used to find the data on disk to be used as inputs for aospy calculations for this run.

default_start_date, default_end_date : datetime.datetime, optional

Default start and end dates of calculations to be performed using this Model.

See also

aospy.DataLoader, aospy.Model

DataLoaders

Run objects rely on a helper “data loader” to specify how to find their underlying data that is saved on disk. This mapping of variables, time ranges, and potentially other parameters to the location of the corresponding data on disk can differ among modeling centers or even between different models at the same center.

Currently supported data loader types are DictDataLoader, NestedDictDataLoader, and GFDLDataLoader Each of these inherit from the abstract base DataLoader class.

Note

An important consideration can be the datatype used to store values in your datasets. In particular, if the float32 datatype is used in storage, it can lead to undesired inaccuracies in the computation of reduction operations (like means) due to upstream issues (see pydata/xarray#1346 for more information). To address this it is recommended to always upcast float32 data to float64. This behavior is turned on by default. If you would like to disable this behavior you can set the upcast_float32 argument in your DataLoader constructors to False.

class aospy.data_loader.DataLoader[source]

A fundamental DataLoader object.

__init__($self, /, *args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

load_variable(var=None, start_date=None, end_date=None, time_offset=None, grid_attrs=None, **DataAttrs)[source]

Load a DataArray for requested variable and time range.

Automatically renames all grid attributes to match aospy conventions.

Parameters:
var : Var

aospy Var object

start_date : datetime.datetime

start date for interval

end_date : datetime.datetime

end date for interval

time_offset : dict

Option to add a time offset to the time coordinate to correct for incorrect metadata.

grid_attrs : dict (optional)

Overriding dictionary of grid attributes mapping aospy internal names to names of grid attributes used in a particular model.

**DataAttrs

Attributes needed to identify a unique set of files to load from

Returns:
da : DataArray

DataArray for the specified variable, date range, and interval in

recursively_compute_variable(var, start_date=None, end_date=None, time_offset=None, model=None, **DataAttrs)[source]

Compute a variable recursively, loading data where needed.

An obvious requirement here is that the variable must eventually be able to be expressed in terms of model-native quantities; otherwise the recursion will never stop.

Parameters:
var : Var

aospy Var object

start_date : datetime.datetime

start date for interval

end_date : datetime.datetime

end date for interval

time_offset : dict

Option to add a time offset to the time coordinate to correct for incorrect metadata.

model : Model

aospy Model object (optional)

**DataAttrs

Attributes needed to identify a unique set of files to load from

Returns:
da : DataArray

DataArray for the specified variable, date range, and interval in

class aospy.data_loader.DictDataLoader(file_map=None, upcast_float32=True, data_vars='minimal', coords='minimal', preprocess_func=<function DictDataLoader.<lambda>>)[source]

A DataLoader that uses a dict mapping lists of files to string tags.

This is the simplest DataLoader; it is useful for instance if one is dealing with raw model history files, which tend to group all variables of a single output interval into single filesets. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value.

Parameters:
file_map : dict

A dict mapping an input interval to a list of files

upcast_float32 : bool (default True)

Whether to cast loaded DataArrays with the float32 datatype to float64 before doing calculations

data_vars : str (default ‘minimal’)

Mode for concatenating data variables in call to xr.open_mfdataset

coords : str (default ‘minimal’)

Mode for concatenating coordinate variables in call to xr.open_mfdataset.

preprocess_func : function (optional)

A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case of two sets of files, one with monthly average output, and one with 3-hourly output.

>>> file_map = {'monthly': '000[4-6]0101.atmos_month.nc',
...             '3hr': '000[4-6]0101.atmos_8xday.nc'}
>>> data_loader = DictDataLoader(file_map)

If one wanted to correct a CF-incompliant units attribute on each Dataset read in, which depended on the intvl_in of the fileset one could define a preprocess_func which took into account the intvl_in keyword argument.

>>> def preprocess(ds, **kwargs):
...     if kwargs['intvl_in'] == 'monthly':
...         ds['time'].attrs['units'] = 'days since 0001-01-0000'
...     if kwargs['intvl_in'] == '3hr':
...         ds['time'].attrs['units'] = 'hours since 0001-01-0000'
...     return ds
>>> data_loader = DictDataLoader(file_map, preprocess)
__init__(file_map=None, upcast_float32=True, data_vars='minimal', coords='minimal', preprocess_func=<function DictDataLoader.<lambda>>)[source]

Create a new DictDataLoader.

class aospy.data_loader.NestedDictDataLoader(file_map=None, upcast_float32=True, data_vars='minimal', coords='minimal', preprocess_func=<function NestedDictDataLoader.<lambda>>)[source]

DataLoader that uses a nested dictionary mapping to load files.

This is the most flexible existing type of DataLoader; it allows for the specification of different sets of files for different variables. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value. The variable name can be any variable name in your aospy object library (including alternative names).

Parameters:
file_map : dict

A dict mapping intvl_in to dictionaries mapping Var objects to lists of files

upcast_float32 : bool (default True)

Whether to cast loaded DataArrays with the float32 datatype to float64 before doing calculations

data_vars : str (default ‘minimal’)

Mode for concatenating data variables in call to xr.open_mfdataset

coords : str (default ‘minimal’)

Mode for concatenating coordinate variables in call to xr.open_mfdataset.

preprocess_func : function (optional)

A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case of a set of monthly average files for large scale precipitation, and another monthly average set of files for convective precipitation.

>>> file_map = {'monthly': {'precl': '000[4-6]0101.precl.nc',
...                         'precc': '000[4-6]0101.precc.nc'}}
>>> data_loader = NestedDictDataLoader(file_map)

See aospy.data_loader.DictDataLoader for an example of a possible function to pass as a preprocess_func.

__init__(file_map=None, upcast_float32=True, data_vars='minimal', coords='minimal', preprocess_func=<function NestedDictDataLoader.<lambda>>)[source]

Create a new NestedDictDataLoader

class aospy.data_loader.GFDLDataLoader(template=None, data_direc=None, data_dur=None, data_start_date=None, data_end_date=None, upcast_float32=None, data_vars=None, coords=None, preprocess_func=None)[source]

DataLoader for NOAA GFDL model output.

This is an example of a domain-specific custom DataLoader, designed specifically for finding files output by the Geophysical Fluid Dynamics Laboratory’s model history file post-processing tools.

Parameters:
template : GFDLDataLoader

Optional argument to specify a base GFDLDataLoader to inherit parameters from

data_direc : str

Root directory of data files

data_dur : int

Number of years included per post-processed file

data_start_date : datetime.datetime

Start date of data files

data_end_date : datetime.datetime

End date of data files

upcast_float32 : bool (default True)

Whether to cast loaded DataArrays with the float32 datatype to float64 before doing calculations

data_vars : str (default ‘minimal’)

Mode for concatenating data variables in call to xr.open_mfdataset

coords : str (default ‘minimal’)

Mode for concatenating coordinate variables in call to xr.open_mfdataset.

preprocess_func : function (optional)

A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case without a template to start from.

>>> base = GFDLDataLoader(data_direc='/archive/control/pp', data_dur=5,
...                       data_start_date=datetime(2000, 1, 1),
...                       data_end_date=datetime(2010, 12, 31))

Case with a starting template.

>>> data_loader = GFDLDataLoader(base, data_direc='/archive/2xCO2/pp')

See aospy.data_loader.DictDataLoader for an example of a possible function to pass as a preprocess_func.

__init__(template=None, data_direc=None, data_dur=None, data_start_date=None, data_end_date=None, upcast_float32=None, data_vars=None, coords=None, preprocess_func=None)[source]

Create a new GFDLDataLoader

Variables and Regions

The Var and Region classes are used to represent, respectively, physical quantities the user wishes to be able to compute and geographical regions over which the user wishes to aggregate their calculations.

Whereas the Proj - Model - Run hierarchy is used to describe the data resulting from particular model simulations, Var and Region represent the properties of generic physical entities that do not depend on the underlying data.

Var
class aospy.var.Var(name, alt_names=None, func=None, variables=None, units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]

An object representing a physical quantity to be computed.

Attributes:
name : str

The variable’s name

alt_names : tuple of strings

All other names that the variable may be referred to in the input data

names : tuple of strings

The combination of name and alt_names

description : str

A description of the variable

func : function

The function with which to compute the variable

variables : sequence of aospy.Var objects

The variables passed to func to compute it

units : str

The variable’s physical units

domain : str

The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’

def_time, def_vert, def_lat, def_lon : bool

Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude

math_str : str

The mathematical representation of the variable

colormap : str

The name of the default colormap to be used in plots of this variable

valid_range : length-2 tuple

The range of values outside which to flag as unphysical/erroneous

__init__(name, alt_names=None, func=None, variables=None, units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]

Instantiate a Var object.

Parameters:
name : str

The variable’s name

alt_names : tuple of strings

All other names that the variable might be referred to in any input data. Each of these should be unique to this variable in order to avoid loading the wrong quantity.

description : str

A description of the variable

func : function

The function with which to compute the variable

variables : sequence of aospy.Var objects

The variables passed to func to compute it. Order matters: whenever calculations are performed to generate data corresponding to this Var, the data corresponding to the elements of variables will be passed to self.function in the same order.

units : str

The variable’s physical units

domain : str

The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’. This is only used by aospy by some types of DataLoader, including GFDLDataLoader.

def_time, def_vert, def_lat, def_lon : bool

Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude

math_str : str

The mathematical representation of the variable. This is typically a raw string of LaTeX math-mode, e.g. r’$T_mathrm{sfc}$’ for surface temperature.

colormap : str

(Currently not used by aospy) The name of the default colormap to be used in plots of this variable.

valid_range : length-2 tuple

The range of values outside which to flag as unphysical/erroneous

mask_unphysical(data)[source]

Mask data array where values are outside physically valid range.

to_plot_units(data, dtype_vert=False)[source]

Convert the given data to plotting units.

Note

While for the sake of tracking metadata we encourage users to add a units attribute to each of their Var objects, these units attributes provide nothing more than descriptive value. One day we hope DataArrays produced by loading or computing variables will be truly units-aware (e.g. adding or subtracting two DataArrays with different units will lead to an error, or multiplying two DataArrays will result in a DataArray with new units), but we will leave that to upstream libraries to implement (see pydata/xarray#525 for more discussion).

Region
class aospy.region.Region(name='', description='', west_bound=None, east_bound=None, south_bound=None, north_bound=None, mask_bounds=None, do_land_mask=False)[source]

Geographical region over which to perform averages and other reductions.

Each Proj object includes a list of Region objects, which is used by Calc to determine which regions over which to perform time reductions over region-average quantities.

Region boundaries are specified as either a single “rectangle” in latitude and longitude or the union of multiple such rectangles. In addition, a land or ocean mask can be applied.

See also

aospy.Calc.region_calcs

Attributes:
name : str

The region’s name

description : str

A description of the region

mask_bounds : tuple

The coordinates definining the lat-lon rectangle(s) that define(s) the region’s boundaries

do_land_mask

Whether values occurring over land, ocean, or neither are excluded from the region, and whether the included points must be strictly land or ocean or if fractional land/ocean values are included.

__init__(name='', description='', west_bound=None, east_bound=None, south_bound=None, north_bound=None, mask_bounds=None, do_land_mask=False)[source]

Instantiate a Region object.

Note that longitudes spanning (-180, 180), (0, 360), or any other range are all supported: -180 to 0, 180 to 360, etc. are interpreted as the western hemisphere, and 0-180, 360-540, etc. are interpreted as the eastern hemisphere. This is true both of the region definition and of any data upon which the region mask is applied.

E.g. suppose some of your data is defined on a -180 to 180 longitude grid, some of it is defined on a 0 to 360 grid, and some of it is defined on a -70 to 290 grid. A single Region object will work with all three of these.

Conversely, latitudes are always treated as -90 as the South Pole, 0 as the Equator, and 90 as the North Pole. Latitudes larger than 90 are not physically meaningful.

Parameters:
name : str

The region’s name. This must be unique from that of any other Region objects being used by the overlying Proj.

description : str, optional

A description of the region. This is not used internally by aospy; it is solely for the user’s information.

west_bound, east_bound : { scalar, aospy.Longitude }, optional

The western and eastern boundaries of the region. All input longitudes are casted to aospy.Longitude objects, which essentially maps them to a 180W to 180E grid. The region’s longitudes always start at west_bound and move toward the east until reaching east_bound. This means that there are two distinct cases:

  • If, after this translation, west_bound is less than east_bound, the region includes the points east of west_bound and west of east_bound.
  • If west_bound greater than east_bound, then the region is treated as wrapping around the dateline, i.e. it’s western-most point is east_bound, and it includes all points moving east from there until west_bound.

If the region boundaries are more complicated than a single lat-lon rectangle, use mask_bounds instead.

south_bound, north_bound : scalar, optional

The southern, and northern boundaries, respectively, of the region. If the region boundaries are more complicated than a single lat-lon rectangle, use mask_bounds instead.

mask_bounds : sequence, optional

Each element is a length-4 sequence of the format (west_bound, east_bound, south_bound, north_bound), where each of these _bound arguments is of the form described above.

do_land_mask : { False, True, ‘ocean’, ‘strict_land’, ‘strict_ocean’},

optional

Determines what, if any, land mask is applied in addition to the mask defining the region’s boundaries. Default False.

  • True: apply the data’s full land mask
  • False: apply no mask
  • ‘ocean’: mask out land rather than ocean
  • ‘strict_land’: mask out all points that are not 100% land
  • ‘strict_ocean’: mask out all points that are not 100% ocean

Examples

Define a region spanning the entire globe:

>>> globe = Region(name='globe', west_bound=0, east_bound=360,
...                south_bound=-90, north_bound=90, do_land_mask=False)

Longitudes are handled as cyclic, so this definition could have equivalently used west_bound=-180, east_bound=180 or west_bound=200, east_bound=560, or anything else that spanned 360 degrees total.

Define a region corresponding to land in the mid-latitudes, which we’ll define as land points within 30-60 degrees latitude in both hemispheres. Because this region is the union of multiple lat-lon rectangles, it has to be defined using mask_bounds:

>>> land_mid_lats = Region(name='land_midlat', do_land_mask=True,
...                        mask_bounds=[(-180, 180, 30, 60),
...                                     (-180, 180, -30, -60)])

Define a region spanning the southern Tropical Atlantic ocean, which we’ll take to be all ocean points between 60W and 30E and between the Equator and 30S:

>>> atl_south_trop = Region(name='atl_sh_trop', west_bound=-60,
...                         east_bound=30, south_bound=-30,
...                         north_bound=0, do_land_mask='ocean')

Define the “opposite” region, i.e. all ocean points in the southern Tropics outside of the Atlantic. We simply swap west_bound and east_bound of the previous example:

>>> non_atl_south_trop = Region(name='non_atl_sh_trop', west_bound=30,
...                         east_bound=-60, south_bound=-30,
...                         north_bound=0, do_land_mask='ocean')
av(data, lon_str='lon', lat_str='lat', land_mask_str='land_mask', sfc_area_str='sfc_area')[source]

Time-average of region-averaged data.

Parameters:
data : xarray.DataArray

The array to compute the regional time-average of

lat_str, lon_str, land_mask_str, sfc_area_str : str, optional

The name of the latitude, longitude, land mask, and surface area coordinates, respectively, in data. Defaults are the corresponding values in aospy.internal_names.

Returns:
xarray.DataArray

The region-averaged and time-averaged data.

mask_var(data, lon_cyclic=True, lon_str='lon', lat_str='lat')[source]

Mask the given data outside this region.

Parameters:
data : xarray.DataArray

The array to be regionally masked.

lon_cyclic : bool, optional (default True)

Whether or not the longitudes of data span the whole globe, meaning that they should be wrapped around as necessary to cover the Region’s full width.

lon_str, lat_str : str, optional

The names of the longitude and latitude dimensions, respectively, in the data to be masked. Defaults are aospy.internal_names.LON_STR and aospy.internal_names.LON_STR, respectively.

Returns:
xarray.DataArray

The original array with points outside of the region masked.

std(data, lon_str='lon', lat_str='lat', land_mask_str='land_mask', sfc_area_str='sfc_area')[source]

Temporal standard deviation of region-averaged data.

Parameters:
data : xarray.DataArray

The array to compute the regional time-average of

lat_str, lon_str, land_mask_str, sfc_area_str : str, optional

The name of the latitude, longitude, land mask, and surface area coordinates, respectively, in data. Defaults are the corresponding values in aospy.internal_names.

Returns:
xarray.DataArray

The temporal standard deviation of the region-averaged data

ts(data, lon_cyclic=True, lon_str='lon', lat_str='lat', land_mask_str='land_mask', sfc_area_str='sfc_area')[source]

Create yearly time-series of region-averaged data.

Parameters:
data : xarray.DataArray

The array to create the regional timeseries of

lon_cyclic : { None, True, False }, optional (default True)

Whether or not the longitudes of data span the whole globe, meaning that they should be wrapped around as necessary to cover the Region’s full width.

lat_str, lon_str, land_mask_str, sfc_area_str : str, optional

The name of the latitude, longitude, land mask, and surface area coordinates, respectively, in data. Defaults are the corresponding values in aospy.internal_names.

Returns:
xarray.DataArray

The timeseries of values averaged within the region and within each year, one value per year.

Calculations

Calc is the engine that combines the user’s specifications of (1) the data on disk via Proj, Model, and Run, (2) the physical quantity to compute and regions to aggregate over via Var and Region, and (3) the desired date range, time reduction method, and other characteristics to actually perform the calculation

Whereas Proj, Model, Run, Var, and Region are all intended to be saved in .py files for reuse, Calc objects are intended to be generated dynamically by a main script and then not retained after they have written their outputs to disk following the user’s specifications.

Moreover, if the main.py script is used to execute calculations, no direct interfacing with Calc is required by the user, in which case this section should be skipped entirely.

Also included is the automate module, which enables aospy e.g. in the main script to find objects in the user’s object library that the user specifies via their string names rather than having to import the objects themselves.

Calc
class aospy.calc.Calc(proj=None, model=None, run=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]

Class for executing, saving, and loading a single computation.

__init__(proj=None, model=None, run=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]

Instantiate a Calc object.

Parameters:
proj : aospy.Proj object

The project for this calculation.

model : aospy.Model object

The model for this calculation.

run : aospy.Run object

The run for this calculation.

var : aospy.Var object

The variable for this calculation.

region : sequence of aospy.Region objects

The region(s) over which any regional reductions will be performed.

date_range : tuple of datetime.datetime objects

The range of dates over which to perform calculations.

intvl_in : {None, ‘annual’, ‘monthly’, ‘daily’, ‘6hr’, ‘3hr’}, optional

The time resolution of the input data.

dtype_in_time : {None, ‘inst’, ‘ts’, ‘av’, ‘av_ts’}, optional

What the time axis of the input data represents:

  • ‘inst’ : Timeseries of instantaneous values
  • ‘ts’ : Timeseries of averages over the period of each time-index
  • ‘av’ : A single value averaged over a date range
dtype_in_vert : {None, ‘pressure’, ‘sigma’}, optional

The vertical coordinate system used by the input data:

  • None : not defined vertically
  • ‘pressure’ : pressure coordinates
  • ‘sigma’ : hybrid sigma-pressure coordinates
intvl_out : {‘ann’, season-string, month-integer}

The sub-annual time interval over which to compute:

  • ‘ann’ : Annual mean
  • season-string : E.g. ‘JJA’ for June-July-August
  • month-integer : 1 for January, 2 for February, etc.
dtype_out_time : tuple with elements being one or more of:
  • Gridpoint-by-gridpoint output:
    • ‘av’ : Gridpoint-by-gridpoint time-average
    • ‘std’ : Gridpoint-by-gridpoint temporal standard deviation
    • ‘ts’ : Gridpoint-by-gridpoint time-series
  • Averages over each region specified via region:
    • ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’
dtype_out_vert : {None, ‘vert_av’, ‘vert_int’}, optional

How to reduce the data vertically:

  • None : no vertical reduction (i.e. output is defined vertically)
  • ‘vert_av’ : mass-weighted vertical average
  • ‘vert_int’ : mass-weighted vertical integral
time_offset : {None, dict}, optional

How to offset input data in time to correct for metadata errors

ARR_XARRAY_NAME = 'aospy_result'
compute(write_to_tar=True)[source]

Perform all desired calculations on the data and save externally.

load(dtype_out_time, dtype_out_vert=False, region=False, plot_units=False, mask_unphysical=False)[source]

Load the data from the object if possible or from disk.

region_calcs(arr, func)[source]

Perform a calculation for all regions.

save(data, dtype_out_time, dtype_out_vert=False, save_files=True, write_to_tar=False)[source]

Save aospy data to data_out attr and to an external file.

automate

Functionality for specifying and cycling through multiple calculations.

exception aospy.automate.AospyException[source]

Base exception class for the aospy package.

class aospy.automate.CalcSuite(calc_suite_specs)[source]

Suite of Calc objects generated from provided specifications.

create_calcs()[source]

Generate a Calc object for each requested parameter combination.

aospy.automate.submit_mult_calcs(calc_suite_specs, exec_options=None)[source]

Generate and execute all specified computations.

Once the calculations are prepped and submitted for execution, any calculation that triggers any exception or error is skipped, and the rest of the calculations proceed unaffected. This prevents an error in a single calculation from crashing a large suite of calculations.

Parameters:
calc_suite_specs : dict

The specifications describing the full set of calculations to be generated and potentially executed. Accepted keys and their values:

library : module or package comprising an aospy object library

The aospy object library for these calculations.

projects : list of aospy.Proj objects

The projects to permute over.

models : ‘all’, ‘default’, or list of aospy.Model objects

The models to permute over. If ‘all’, use all models in the models attribute of each Proj. If ‘default’, use all models in the default_models attribute of each Proj.

runs : ‘all’, ‘default’, or list of aospy.Run objects

The runs to permute over. If ‘all’, use all runs in the runs attribute of each Model. If ‘default’, use all runs in the default_runs attribute of each Model.

variables : list of aospy.Var objects

The variables to be calculated.

regions : ‘all’ or list of aospy.Region objects

The region(s) over which any regional reductions will be performed. If ‘all’, use all regions in the regions attribute of each Proj.

date_ranges : ‘default’ or a list of tuples

The range of dates (inclusive) over which to perform calculations. If ‘default’, use the default_start_date and default_end_date attribute of each Run. Else provide a list of tuples, each containing a pair of start and end dates, such as date_ranges=[(start, end)] where start and end are each datetime.datetime objects, partial datetime strings (e.g. ‘0001’), np.datetime64 objects, or cftime.datetime objects.

output_time_intervals : {‘ann’, season-string, month-integer}

The sub-annual time interval over which to aggregate.

  • ‘ann’ : Annual mean
  • season-string : E.g. ‘JJA’ for June-July-August
  • month-integer : 1 for January, 2 for February, etc. Each one is
    a separate reduction, e.g. [1, 2] would produce averages (or other specified time reduction) over all Januaries, and separately over all Februaries.
output_time_regional_reductions : list of reduction string identifiers

Unlike most other keys, these are not permuted over when creating the aospy.Calc objects that execute the calculations; each aospy.Calc performs all of the specified reductions. Accepted string identifiers are:

  • Gridpoint-by-gridpoint output:
    • ‘av’ : Gridpoint-by-gridpoint time-average
    • ‘std’ : Gridpoint-by-gridpoint temporal standard deviation
    • ‘ts’ : Gridpoint-by-gridpoint time-series
  • Averages over each region specified via region:
    • ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’
output_vertical_reductions : {None, ‘vert_av’, ‘vert_int’}, optional

How to reduce the data vertically:

  • None : no vertical reduction
  • ‘vert_av’ : mass-weighted vertical average
  • ‘vert_int’ : mass-weighted vertical integral
input_time_intervals : {‘annual’, ‘monthly’, ‘daily’, ‘#hr’}

A string specifying the time resolution of the input data. In ‘#hr’ above, the ‘#’ stands for a number, e.g. 3hr or 6hr, for sub-daily output. These are the suggested specifiers, but others may be used if they are also used by the DataLoaders for the given Runs.

input_time_datatypes : {‘inst’, ‘ts’, ‘av’}

What the time axis of the input data represents:

  • ‘inst’ : Timeseries of instantaneous values
  • ‘ts’ : Timeseries of averages over the period of each time-index
  • ‘av’ : A single value averaged over a date range
input_vertical_datatypes : {False, ‘pressure’, ‘sigma’}, optional

The vertical coordinate system used by the input data:

  • False : not defined vertically
  • ‘pressure’ : pressure coordinates
  • ‘sigma’ : hybrid sigma-pressure coordinates
input_time_offsets : {None, dict}, optional

How to offset input data in time to correct for metadata errors

exec_options : dict or None (default None)

Options regarding how the calculations are reported, submitted, and saved. If None, default settings are used for all options. Currently supported options (each should be either True or False):

  • prompt_verify : (default False) If True, print summary of
    calculations to be performed and prompt user to confirm before submitting for execution.
  • parallelize : (default False) If True, submit calculations in
    parallel.
  • client : distributed.Client or None (default None) The
    dask.distributed Client used to schedule computations. If None and parallelize is True, a LocalCluster will be started.
  • write_to_tar : (default True) If True, write results of calculations
    to .tar files, one for each aospy.Run object. These tar files have an identical directory structures the standard output relative to their root directory, which is specified via the tar_direc_out argument of each Proj object’s instantiation.
Returns:
A list of the return values from each :py:meth:`aospy.Calc.compute` call

If a calculation ran without error, this value is the aospy.Calc object itself, with the results of its calculations saved in its data_out attribute. data_out is a dictionary, with the keys being the temporal-regional reduction identifiers (e.g. ‘reg.av’), and the values being the corresponding result.

If any error occurred during a calculation, the return value is None.
Raises:
AospyException

If the prompt_verify option is set to True and the user does not respond affirmatively to the prompt.

Utilities

aospy includes a number of utility functions that are used internally and may also be useful to users for their own purposes. These include functions pertaining to input/output (IO), longitudes, time arrays, and vertical coordinates.

utils.io

Utility functions for data input and output.

aospy.utils.io.data_in_label(intvl_in, dtype_in_time, dtype_in_vert=False)[source]

Create string label specifying the input data of a calculation.

aospy.utils.io.data_name_gfdl(name, domain, data_type, intvl_type, data_yr, intvl, data_in_start_yr, data_in_dur)[source]

Determine the filename of GFDL model data output.

aospy.utils.io.data_out_label(time_intvl, dtype_time, dtype_vert=False)[source]
aospy.utils.io.dmget(files_list)[source]

Call GFDL command ‘dmget’ to access archived files.

aospy.utils.io.time_label(intvl, return_val=True)[source]

Create time interval label for aospy data I/O.

aospy.utils.io.yr_label(yr_range)[source]

Create label of start and end years for aospy data I/O.

utils.longitude

Functionality relating to parsing and comparing longitudes.

class aospy.utils.longitude.Longitude(value)[source]

Geographic longitude.

Enables unambiguous comparison of longitudes using the standard comparison operators, regardless of they were initially represented with a 0 to 360 convention, -180 to 180 convention, or anything else, and even if the original convention differs between them.

Specifically, the < operator assesses if the first object is to the west of the second object, with the standard convention that longitudes in the Western Hemisphere are always to the west of longitudes in the Eastern Hemisphere. The > operator is defined analogously. ==, >=, and <= are also all defined.

In addition to other Longitude objects, the operators can be used to compare a Longitude object to anything that can be casted to a Longitude object, or to any sequence (e.g. a list or xarray.DataArray) whose elements can be casted to Longitude objects.

hemisphere

{‘W’, ‘E’} The longitude’s hemisphere, either western or eastern.

longitude

(scalar) The unsigned numerical value of the longitude.

Always in the range 0 to 180. Must be combined with the hemisphere attribute to specify the exact latitude.

to_0360()[source]

Convert longitude to its numerical value within [0, 360).

to_pm180()[source]

Convert longitude to its numerical value within [-180, 180).

aospy.utils.longitude.lon_to_0360(lon)[source]

Convert longitude(s) to be within [0, 360).

The Eastern hemisphere corresponds to 0 <= lon + (n*360) < 180, and the Western Hemisphere corresponds to 180 <= lon + (n*360) < 360, where ‘n’ is any integer (positive, negative, or zero).

Parameters:
lon : scalar or sequence of scalars

One or more longitude values to be converted to lie in the [0, 360) range

Returns:
If ``lon`` is a scalar, then a scalar of the same type in the range [0,

360). If lon is array-like, then an array-like of the same type with each element a scalar in the range [0, 360).

aospy.utils.longitude.lon_to_pm180(lon)[source]

Convert longitude(s) to be within [-180, 180).

The Eastern hemisphere corresponds to 0 <= lon + (n*360) < 180, and the Western Hemisphere corresponds to 180 <= lon + (n*360) < 360, where ‘n’ is any integer (positive, negative, or zero).

Parameters:
lon : scalar or sequence of scalars

One or more longitude values to be converted to lie in the [-180, 180) range

Returns:
If ``lon`` is a scalar, then a scalar of the same type in the range

[-180, 180). If lon is array-like, then an array-like of the same type with each element a scalar in the range [-180, 180).

utils.times

Utility functions for handling times, dates, etc.

aospy.utils.times.add_uniform_time_weights(ds)[source]

Append uniform time weights to a Dataset.

All DataArrays with a time coordinate require a time weights coordinate. For Datasets read in without a time bounds coordinate or explicit time weights built in, aospy adds uniform time weights at each point in the time coordinate.

Parameters:
ds : Dataset

Input data

Returns:
Dataset
aospy.utils.times.apply_time_offset(time, years=0, months=0, days=0, hours=0)[source]

Apply a specified offset to the given time array.

This is useful for GFDL model output of instantaneous values. For example, 3 hourly data postprocessed to netCDF files spanning 1 year each will actually have time values that are offset by 3 hours, such that the first value is for 1 Jan 03:00 and the last value is 1 Jan 00:00 of the subsequent year. This causes problems in xarray, e.g. when trying to group by month. It is resolved by manually subtracting off those three hours, such that the dates span from 1 Jan 00:00 to 31 Dec 21:00 as desired.

Parameters:
time : xarray.DataArray representing a timeseries
years, months, days, hours : int, optional

The number of years, months, days, and hours, respectively, to offset the time array by. Positive values move the times later.

Returns:
pandas.DatetimeIndex

Examples

Case of a length-1 input time array:

>>> times = xr.DataArray(datetime.datetime(1899, 12, 31, 21))
>>> apply_time_offset(times)
Timestamp('1900-01-01 00:00:00')

Case of input time array with length greater than one:

>>> times = xr.DataArray([datetime.datetime(1899, 12, 31, 21),
...                       datetime.datetime(1899, 1, 31, 21)])
>>> apply_time_offset(times) # doctest: +NORMALIZE_WHITESPACE
DatetimeIndex(['1900-01-01', '1899-02-01'], dtype='datetime64[ns]',
              freq=None)
aospy.utils.times.assert_matching_time_coord(arr1, arr2)[source]

Check to see if two DataArrays have the same time coordinate.

Parameters:
arr1 : DataArray or Dataset

First DataArray or Dataset

arr2 : DataArray or Dataset

Second DataArray or Dataset

Raises:
ValueError

If the time coordinates are not identical between the two Datasets

aospy.utils.times.average_time_bounds(ds)[source]

Return the average of each set of time bounds in the Dataset.

Useful for creating a new time array to replace the Dataset’s native time array, in the case that the latter matches either the start or end bounds. This can cause errors in grouping (akin to an off-by-one error) if the timesteps span e.g. one full month each. Note that the Dataset’s times must not have already undergone “CF decoding”, wherein they are converted from floats using the ‘units’ attribute into datetime objects.

Parameters:
ds : xarray.Dataset

A Dataset containing a time bounds array with name matching internal_names.TIME_BOUNDS_STR. This time bounds array must have two dimensions, one of which’s coordinates is the Dataset’s time array, and the other is length-2.

Returns:
xarray.DataArray

The mean of the start and end times of each timestep in the original Dataset.

Raises:
ValueError

If the time bounds array doesn’t match the shape specified above.

aospy.utils.times.datetime_or_default(date, default)[source]

Return a datetime-like object or a default.

Parameters:
date : None or datetime-like object or str
default : The value to return if date is None
Returns:
`default` if `date` is `None`, otherwise returns the result of
`utils.times.ensure_datetime(date)`
aospy.utils.times.ensure_datetime(obj)[source]

Return the object if it is a datetime-like object

Parameters:
obj : Object to be tested.
Returns:
The original object if it is a datetime-like object
Raises:
TypeError if `obj` is not datetime-like
aospy.utils.times.ensure_time_as_index(ds)[source]

Ensures that time is an indexed coordinate on relevant quantites.

Sometimes when the data we load from disk has only one timestep, the indexing of time-defined quantities in the resulting xarray.Dataset gets messed up, in that the time bounds array and data variables don’t get indexed by time, even though they should. Therefore, we need this helper function to (possibly) correct this.

Note that this must be applied before CF-conventions are decoded; otherwise it casts np.datetime64[ns] as int values.

Parameters:
ds : Dataset

Dataset with a time coordinate

Returns:
Dataset
aospy.utils.times.ensure_time_avg_has_cf_metadata(ds)[source]

Add time interval length and bounds coordinates for time avg data.

If the Dataset or DataArray contains time average data, enforce that there are coordinates that track the lower and upper bounds of the time intervals, and that there is a coordinate that tracks the amount of time per time average interval.

CF conventions require that a quantity stored as time averages over time intervals must have time and time_bounds coordinates [1]. aospy further requires AVERAGE_DT for time average data, for accurate time-weighted averages, which can be inferred from the CF-required time_bounds coordinate if needed. This step should be done prior to decoding CF metadata with xarray to ensure proper computed timedeltas for different calendar types.

[1]http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_data_representative_of_cells
Parameters:
ds : Dataset or DataArray

Input data

Returns:
Dataset or DataArray

Time average metadata attributes added if needed.

aospy.utils.times.extract_months(time, months)[source]

Extract times within specified months of the year.

Parameters:
time : xarray.DataArray

Array of times that can be represented by numpy.datetime64 objects (i.e. the year is between 1678 and 2262).

months : Desired months of the year to include
Returns:
xarray.DataArray of the desired times
aospy.utils.times.infer_year(date)[source]

Given a datetime-like object or string infer the year.

Parameters:
date : datetime-like object or str

Input date

Returns:
int

Examples

>>> infer_year('2000')
2000
>>> infer_year('2000-01')
2000
>>> infer_year('2000-01-31')
2000
>>> infer_year(datetime.datetime(2000, 1, 1))
2000
>>> infer_year(np.datetime64('2000-01-01'))
2000
>>> infer_year(DatetimeNoLeap(2000, 1, 1))
2000
>>>
aospy.utils.times.maybe_convert_to_index_date_type(index, date)[source]

Convert a datetime-like object to the index’s date type.

Datetime indexing in xarray can be done using either a pandas DatetimeIndex or a CFTimeIndex. Both support partial-datetime string indexing regardless of the calendar type of the underlying data; therefore if a string is passed as a date, we return it unchanged. If a datetime-like object is provided, it will be converted to the underlying date type of the index. For a DatetimeIndex that is np.datetime64; for a CFTimeIndex that is an object of type cftime.datetime specific to the calendar used.

Parameters:
index : pd.Index

Input time index

date : datetime-like object or str

Input datetime

Returns:
date of the type appropriate for the time index of the Dataset
aospy.utils.times.month_indices(months)[source]

Convert string labels for months to integer indices.

Parameters:
months : str, int

If int, number of the desired month, where January=1, February=2, etc. If str, must match either ‘ann’ or some subset of ‘jfmamjjasond’. If ‘ann’, use all months. Otherwise, use the specified months.

Returns:
np.ndarray of integers corresponding to desired month indices
Raises:
TypeError : If months is not an int or str

See also

_month_conditional

aospy.utils.times.monthly_mean_at_each_ind(monthly_means, sub_monthly_timeseries)[source]

Copy monthly mean over each time index in that month.

Parameters:
monthly_means : xarray.DataArray

array of monthly means

sub_monthly_timeseries : xarray.DataArray

array of a timeseries at sub-monthly time resolution

Returns:
xarray.DataArray with eath monthly mean value from `monthly_means` repeated
at each time within that month from `sub_monthly_timeseries`

See also

monthly_mean_ts
Create timeseries of monthly mean values
aospy.utils.times.monthly_mean_ts(arr)[source]

Convert a sub-monthly time-series into one of monthly means.

Also drops any months with no data in the original DataArray.

Parameters:
arr : xarray.DataArray

Timeseries of sub-monthly temporal resolution data

Returns:
xarray.DataArray

Array resampled to comprise monthly means

See also

monthly_mean_at_each_ind
Copy monthly means to each submonthly time
aospy.utils.times.sel_time(da, start_date, end_date)[source]

Subset a DataArray or Dataset for a given date range.

Ensures that data are present for full extent of requested range. Appends start and end date of the subset to the DataArray.

Parameters:
da : DataArray or Dataset

data to subset

start_date : np.datetime64

start of date interval

end_date : np.datetime64

end of date interval

Returns:
da : DataArray or Dataset

subsetted data

Raises:
AssertionError

if data for requested range do not exist for part or all of requested range

aospy.utils.times.yearly_average(arr, dt)[source]

Average a sub-yearly time-series over each year.

Resulting timeseries comprises one value for each year in which the original array had valid data. Accounts for (i.e. ignores) masked values in original data when computing the annual averages.

Parameters:
arr : xarray.DataArray

The array to be averaged

dt : xarray.DataArray

Array of the duration of each timestep

Returns:
xarray.DataArray

Has the same shape and mask as the original arr, except for the time dimension, which is truncated to one value for each year that arr spanned

utils.vertcoord

Utility functions for dealing with vertical coordinates.

aospy.utils.vertcoord.d_deta_from_pfull(arr)[source]

Compute $partial/partialeta$ of the array on full hybrid levels.

$eta$ is the model vertical coordinate, and its value is assumed to simply increment by 1 from 0 at the surface upwards. The data to be differenced is assumed to be defined at full pressure levels.

Parameters:
arr : xarray.DataArray containing the ‘pfull’ dim
Returns:
deriv : xarray.DataArray with the derivative along ‘pfull’ computed via

2nd order centered differencing.

aospy.utils.vertcoord.d_deta_from_phalf(arr, pfull_coord)[source]

Compute pressure level thickness from half level pressures.

aospy.utils.vertcoord.does_coord_increase_w_index(arr)[source]

Determine if the array values increase with the index.

Useful, e.g., for pressure, which sometimes is indexed surface to TOA and sometimes the opposite.

aospy.utils.vertcoord.dp_from_p(p, ps, p_top=0.0, p_bot=110000.0)[source]

Get level thickness of pressure data, incorporating surface pressure.

Level edges are defined as halfway between the levels, as well as the user- specified uppermost and lowermost values. The dp of levels whose bottom pressure is less than the surface pressure is not changed by ps, since they don’t intersect the surface. If ps is in between a level’s top and bottom pressures, then its dp becomes the pressure difference between its top and ps. If ps is less than a level’s top and bottom pressures, then that level is underground and its values are masked.

Note that postprocessing routines (e.g. at GFDL) typically mask out data wherever the surface pressure is less than the level’s given value, not the level’s upper edge. This masks out more levels than the

aospy.utils.vertcoord.dp_from_ps(bk, pk, ps, pfull_coord)[source]

Compute pressure level thickness from surface pressure

aospy.utils.vertcoord.get_dim_name(arr, names)[source]

Determine if an object has an attribute name matching a given list.

aospy.utils.vertcoord.int_dp_g(arr, dp)[source]

Mass weighted integral.

aospy.utils.vertcoord.integrate(arr, ddim, dim=False, is_pressure=False)[source]

Integrate along the given dimension.

aospy.utils.vertcoord.level_thickness(p, p_top=0.0, p_bot=101325.0)[source]

Calculates the thickness, in Pa, of each pressure level.

Assumes that the pressure values given are at the center of that model level, except for the lowest value (typically 1000 hPa), which is the bottom boundary. The uppermost level extends to 0 hPa.

Unlike dp_from_p, this does not incorporate the surface pressure.

aospy.utils.vertcoord.pfull_from_ps(bk, pk, ps, pfull_coord)[source]

Compute pressure at full levels from surface pressure.

aospy.utils.vertcoord.phalf_from_ps(bk, pk, ps)[source]

Compute pressure of half levels of hybrid sigma-pressure coordinates.

aospy.utils.vertcoord.replace_coord(arr, old_dim, new_dim, new_coord)[source]

Replace a coordinate with new one; new and old must have same shape.

aospy.utils.vertcoord.to_hpa(arr)[source]

Convert pressure array from Pa to hPa (if needed).

aospy.utils.vertcoord.to_pascal(arr, is_dp=False)[source]

Force data with units either hPa or Pa to be in Pa.

aospy.utils.vertcoord.to_pfull_from_phalf(arr, pfull_coord)[source]

Compute data at full pressure levels from values at half levels.

aospy.utils.vertcoord.to_phalf_from_pfull(arr, val_toa=0, val_sfc=0)[source]

Compute data at half pressure levels from values at full levels.

Could be the pressure array itself, but it could also be any other data defined at pressure levels. Requires specification of values at surface and top of atmosphere.

aospy.utils.vertcoord.to_radians(arr, is_delta=False)[source]

Force data with units either degrees or radians to be radians.

aospy.utils.vertcoord.vert_coord_name(arr)[source]

See also

  • Spencer Hill’s talk on aospy (slides, recorded talk) at the Seventh Symposium on Advances in Modeling and Analysis Using Python, recorded 2017 January 24 as part of the 2017 American Meteorological Society Annual Meeting.
  • Our guest post, “What’s needed for the future of AOS python? Tools for automating AOS data analysis and management” on the PyAOS blog.
  • Our shout-out in Katy Huff’s PyCon 2017 keynote talk (scroll to 23:35 for the bit about us)
  • The xarray package, upon which aospy relies heavily.

Get in touch

  • Troubleshooting: We are actively seeking new users and are eager to help you get started with aospy! Usage questions, bug reports, and any other correspondence are all welcome and best placed as Issues on our Github repo.
  • Our mailing list: join it! Questions, bug reports, and comments are welcome there also.
  • Contributing: We are also actively seeking new developers! Please get in touch by opening an Issue or submitting a Pull Request.

License

aospy is freely available under the open source Apache License.

History

aospy was originally created by Spencer Hill as a means of automating calculations required for his Ph.D. work across many climate models and simulations. Starting in 2014, Spencer Clark became aospy’s second user and developer. The first official release on PyPI was v0.1 on January 24, 2017.