dimarray documentation¶
Welcome to dimarray documentation.
dimarray provides a numpy array with labelled axes and dimensions, in the spirit of pandas but generalized to N dimensions. It supports metadata and netCDF4 I/O.
Introduction¶
Numpy array with dimensions¶
dimarray is a package to handle numpy arrays with labelled dimensions and axes. Inspired from pandas, it includes advanced alignment and reshaping features and as well as missing-value (NaN) handling.
The main difference with pandas is that it is generalized to N dimensions, and behaves more closely to a numpy array. The axes do not have fixed names (‘index’, ‘columns’, etc…) but are given a meaningful name by the user (e.g. ‘time’, ‘items’, ‘lon’ …). This is especially useful for high dimensional problems such as sensitivity analyses.
A natural I/O format for such an array is netCDF, common in geophysics, which relies on the netCDF4 package, and supports metadata.
License¶
dimarray is distributed under a 3-clause (“Simplified” or “New”) BSD license. Parts of basemap which have BSD compatible licenses are included. See the LICENSE file, which is distributed with the dimarray package, for details.
Getting started¶
A ``DimArray`` can be defined just like a numpy array, with additional information about its dimensions, which can be provided via its axes and dims parameters:
>>> from dimarray import DimArray
>>> a = DimArray([[1.,2,3], [4,5,6]], axes=[['a', 'b'], [1950, 1960, 1970]], dims=['variable', 'time'])
>>> a
dimarray: 6 non-null elements (0 null)
0 / variable (2): 'a' to 'b'
1 / time (3): 1950 to 1970
array([[1., 2., 3.],
[4., 5., 6.]])
Indexing now works on axes
>>> a['b', 1970]
6.0
Or can just be done a la numpy, via integer index:
>>> a.ix[0, -1]
3.0
Basic numpy transformations are also in there:
>>> a.mean(axis='time')
dimarray: 2 non-null elements (0 null)
0 / variable (2): 'a' to 'b'
array([2., 5.])
Can export to pandas for pretty printing:
>>> a.to_pandas()
time 1950 1960 1970
variable
a 1.0 2.0 3.0
b 4.0 5.0 6.0
Useful links¶
Documentation | http://dimarray.readthedocs.org |
Code on github (bleeding edge) | https://github.com/perrette/dimarray |
Code on pypi (releases) | https://pypi.python.org/pypi/dimarray |
Mailing List | http://groups.google.com/group/dimarray |
Issues Tracker | https://github.com/perrette/dimarray/issues |
Install¶
Requirements:
- python >= 2.7, 3
- numpy (tested with 1.7, 1.8, 1.9, 1.10.1, 1.15)
Optional:
- netCDF4 (tested with 1.0.8, 1.2.1) (netCDF archiving) (see notes below)
- matplotlib 1.1 (plotting)
- pandas 0.11 (interface with pandas)
Download the latest version from github and extract from archive Then from the dimarray repository type (possibly preceded by sudo):
python setup.py install
Alternatively, you can use pip to download and install the version from pypi (could be slightly out-of-date):
pip install dimarray
Notes on installing netCDF4¶
- On Ubuntu, using apt-get is the easiest way (as indicated at https://github.com/Unidata/netcdf4-python/blob/master/.travis.yml):
sudo apt-get install libhdf5-serial-dev netcdf-bin libnetcdf-dev
- On windows binaries are available: http://www.unidata.ucar.edu/software/netcdf/docs/winbin.html
- From source. Installing the netCDF4 python module from source can be cumbersome, because
it depends on netCDF4 and (especially) HDF5 C libraries that need to be compiled with specific flags (http://unidata.github.io/netcdf4-python). Detailled information on Ubuntu: https://code.google.com/p/netcdf4-python/wiki/UbuntuInstall
Contributions¶
All suggestions for improvement or direct contributions are very welcome. You can ask a question or start a discussion on the mailing list or open an issue on github for precise requests. See links.
Packages you might also be interested in¶
dimarray is built on top of numpy, as an alternative to larry and pandas
dimarray’s default indexing method is on labels, which makes it very useful as data structure to store high dimensional problems with few labels, such as sensitivity analysises (or e.g. climate scenarios…).
If your focus is on large geoscientific data however, xarray is a more appropriate package, with useful methods to load large datasets, and a datamodel closely aligned on the netCDF. Moreover, standard, numpy-like integer indexing is more apppropriate for geographic maps.
pandas is an excellent package for tabular data analysis, supporting many I/O formats and axis alignment (or “reindexing”) in binary operations. It is mostly limited to 2 dimensions (DataFrame), or up to 4 dimensions (Panel, Panel4D).
See also
Tutorial¶
define a dimarray¶
A ``DimArray`` can be defined just like a numpy array, with additional information about its axes, which can be given via axes and dims parameters.
>>> from dimarray import DimArray, Dataset
>>> a = DimArray([[1.,2,3], [4,5,6]], axes=[['a', 'b'], [1950, 1960, 1970]], dims=['variable', 'time'])
>>> a
dimarray: 6 non-null elements (0 null)
0 / variable (2): 'a' to 'b'
1 / time (3): 1950 to 1970
array([[1., 2., 3.],
[4., 5., 6.]])
data structure¶
Array data are stored in a values attribute:
>>> a.values
array([[1., 2., 3.],
[4., 5., 6.]])
while its axes are stored in axes
>>> a.axes
0 / variable (2): 'a' to 'b'
1 / time (3): 1950 to 1970
As a convenience, axis labels can be accessed directly by name, as an alias for a.axes[‘time’].values:
>>> a.time
array([1950, 1960, 1970])
For more information refer to section on DimArray class (as well as dimarray.Axis
and dimarray.Axes
)
numpy-like attributes¶
Numpy-like attributes dtype, shape, size or ndim are defined, and are now augmented with dims and labels
>>> a.shape
(2, 3)
>>> a.dims # grab axis names (the dimensions)
('variable', 'time')
>>> a.labels # grab axis values
(array(['a', 'b'], dtype=object), array([1950, 1960, 1970]))
indexing¶
Indexing works on labels just as expected, including slice and boolean array.
>>> a['b', 1970]
6.0
but integer-index is always possible via ix toogle between labels- and position-based indexing:
>>> a.ix[0, -1]
3.0
See also
transformation¶
Standard numpy transformations are defined, and now accept axis name:
>>> a.mean(axis='time')
dimarray: 2 non-null elements (0 null)
0 / variable (2): 'a' to 'b'
array([2., 5.])
and can ignore missing values (nans) if asked to:
>>> import numpy as np
>>> a['a',1950] = np.nan
>>> a.mean(axis='time', skipna=True)
dimarray: 2 non-null elements (0 null)
0 / variable (2): 'a' to 'b'
array([2.5, 5. ])
See also
alignment in operations¶
During an operation, arrays are automatically re-indexed to span the same axis domain, with nan filling if needed. This is quite useful when working with partly-overlapping time series or with incomplete sets of items.
>>> yearly_data = DimArray([0, 1, 2], axes=[[1950, 1960, 1970]], dims=['year'])
>>> incomplete_yearly_data = DimArray([10, 100], axes=[[1950, 1960]], dims=['year']) # last year 1970 is missing
>>> yearly_data + incomplete_yearly_data
dimarray: 2 non-null elements (1 null)
0 / year (3): 1950 to 1970
array([ 10., 101., nan])
See also
A check is also performed on the dimensions, to ensure consistency of the data. If dimensions do not match this is not interpreted as an error but rather as a combination of dimensions. For example, you may want to combine some fixed spatial pattern (such as an EOF) with a time-varying time series (the principal component). Or you may want to combine results from a sensitivity analysis where several parameters have been varied (one dimension per parameter). Here a minimal example where the above-define annual variable is combined with seasonally-varying data (camping summer and winter prices).
Arrays are said to be broadcast:
>>> seasonal_data = DimArray([10, 100], axes=[['winter','summer']], dims=['season'])
>>> combined_data = yearly_data * seasonal_data
>>> combined_data
dimarray: 6 non-null elements (0 null)
0 / year (3): 1950 to 1970
1 / season (2): 'winter' to 'summer'
array([[ 0, 0],
[ 10, 100],
[ 20, 200]])
See also
dataset¶
Changed in version 0.1.9.
As a commodity, the `Dataset` class is an ordered dictionary of DimArray instances which all share a common set of axes.
>>> dataset = Dataset({'combined_data':combined_data, 'yearly_data':yearly_data,'seasonal_data':seasonal_data})
>>> dataset
Dataset of 3 variables
0 / season (2): 'winter' to 'summer'
1 / year (3): 1950 to 1970
seasonal_data: ('season',)
combined_data: ('year', 'season')
yearly_data: ('year',)
At initialization, the arrays are aligned on-the-fly. Later on, it is up to the user to reindex the arrays to match the Dataset axes.
Note
since Dataset elements share the same axes, any axis modification will also impact all contained DimArray instances. If this behaviour is not desired, a copy should be made.
netCDF reading and writing¶
A natural I/O format for such an array is netCDF, common in geophysics, which rely on the netCDF4 package. If netCDF4 is installed (much recommanded), a dataset can easily read and write to the netCDF format:
>>> dataset.write_nc('/tmp/test.nc', mode='w')
>>> import dimarray as da
>>> da.read_nc('/tmp/test.nc', 'combined_data')
dimarray: 6 non-null elements (0 null)
0 / year (3): 1950 to 1970
1 / season (2): u'winter' to u'summer'
array([[ 0, 0],
[ 10, 100],
[ 20, 200]])
See also
metadata¶
It is possible to define and access metadata via the standard . syntax to access an object attribute:
>>> a = DimArray([1, 2])
>>> a.name = 'myarray'
>>> a.units = 'meters'
Any non-private attribute is automatically added to a.attrs ordered dictionary:
>>> a.attrs
OrderedDict([('name', 'myarray'), ('units', 'meters')])
Metadata can also be defined for dimarray.Dataset
and dimarray.Axis
instances, and will be written to / read from netCDF files.
Note
Metadata that start with an underscore _ or use any protected class attribute as name (e.g. values, axes, dims and so on) must be set directly in attrs.
See also
Metadata for more information.
join arrays¶
DimArrays can be joined along an existing dimension, we say concatenate (dimarray.concatenate()
):
>>> a = DimArray([11, 12, 13], axes=[[1950, 1951, 1952]], dims=['time'])
>>> b = DimArray([14, 15, 16], axes=[[1953, 1954, 1955]], dims=['time'])
>>> da.concatenate((a, b), axis='time')
dimarray: 6 non-null elements (0 null)
0 / time (6): 1950 to 1955
array([11, 12, 13, 14, 15, 16])
or they can be stacked along each other, thereby creating a new dimension (dimarray.stack()
)
>>> a = DimArray([11, 12, 13], axes=[[1950, 1951, 1952]], dims=['time'])
>>> b = DimArray([21, 22, 23], axes=[[1950, 1951, 1952]], dims=['time'])
>>> da.stack((a, b), axis='items', keys=['a','b'])
dimarray: 6 non-null elements (0 null)
0 / items (2): 'a' to 'b'
1 / time (3): 1950 to 1952
array([[11, 12, 13],
[21, 22, 23]])
In the above note that new axis values were provided via the parameter keys=. If the common “time” dimension was not fully overlapping, array can be aligned prior to stacking via the align=True parameter.
>>> a = DimArray([11, 12, 13], axes=[[1950, 1951, 1952]], dims=['time'])
>>> b = DimArray([21, 23], axes=[[1950, 1952]], dims=['time'])
>>> c = da.stack((a, b), axis='items', keys=['a','b'], align=True)
>>> c
dimarray: 5 non-null elements (1 null)
0 / items (2): 'a' to 'b'
1 / time (3): 1950 to 1952
array([[11., 12., 13.],
[21., nan, 23.]])
See also
drop missing data¶
Say you have data with NaNs:
>>> a = DimArray([[11, np.nan, np.nan],[21,np.nan,23]], axes=[['a','b'],[1950, 1951, 1952]], dims=['items','time'])
>>> a
dimarray: 3 non-null elements (3 null)
0 / items (2): 'a' to 'b'
1 / time (3): 1950 to 1952
array([[11., nan, nan],
[21., nan, 23.]])
You can drop every column that contains a NaN
>>> a.dropna(axis=1) # drop along columns
dimarray: 2 non-null elements (0 null)
0 / items (2): 'a' to 'b'
1 / time (1): 1950 to 1950
array([[11.],
[21.]])
or actually control decide to retain only these columns with a minimum number of valid data, here one:
>>> a.dropna(axis=1, minvalid=1) # drop every column with less than one valid data
dimarray: 3 non-null elements (1 null)
0 / items (2): 'a' to 'b'
1 / time (2): 1950 to 1952
array([[11., nan],
[21., 23.]])
See also
reshaping arrays¶
Additional novelty includes methods to reshaping an array in easy ways, very useful for high-dimensional data analysis.
>>> large_array = DimArray(np.arange(2*2*5*2).reshape(2,2,5,2), dims=('A','B','C','D'))
>>> small_array = large_array.reshape('A,D','B,C')
>>> small_array
dimarray: 40 non-null elements (0 null)
0 / A,D (4): (0, 0) to (1, 1)
1 / B,C (10): (0, 0) to (1, 4)
array([[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19],
[20, 22, 24, 26, 28, 30, 32, 34, 36, 38],
[21, 23, 25, 27, 29, 31, 33, 35, 37, 39]])
See also
interfacing with pandas¶
For things that pandas does better, such as pretty printing, I/O to many formats, and 2-D data analysis, just use the dimarray.DimArray.to_pandas()
method. In the ipython notebook it also has a nice html rendering.
>>> small_array.to_pandas()
B 0 1
C 0 1 2 3 4 0 1 2 3 4
A D
0 0 0 2 4 6 8 10 12 14 16 18
1 1 3 5 7 9 11 13 15 17 19
1 0 20 22 24 26 28 30 32 34 36 38
1 21 23 25 27 29 31 33 35 37 39
B | 0 | 1 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
C | 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 3 | 4 | |
A | D | ||||||||||
0 | 0 | 0 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 |
1 | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 15 | 17 | 19 | |
1 | 0 | 20 | 22 | 24 | 26 | 28 | 30 | 32 | 34 | 36 | 38 |
1 | 21 | 23 | 25 | 27 | 29 | 31 | 33 | 35 | 37 | 39 |
And dimarray.DimArray.from_pandas()
works to convert pandas objects to DimArray (also supports MultiIndex):
>>> import pandas as pd
>>> s = pd.DataFrame([[1,2],[3,4]], index=['a','b'], columns=[1950, 1960])
>>> da.from_pandas(s)
dimarray: 4 non-null elements (0 null)
0 / x0 (2): 'a' to 'b'
1 / x1 (2): 1950 to 1960
array([[1, 2],
[3, 4]])
plotting¶
dimarray comes with basic plotting facility. For 1-D and 2-D data, it simplies interfaces pandas’ plot command (therefore pandas needs to be installed to use it). From the example above:
>>> %pylab # doctest: +SKIP
>>> %matplotlib inline # doctest: +SKIP
>>> a = dataset['combined_data']
>>> a.plot() # doctest: +SKIP
Using matplotlib backend: TkAgg
Populating the interactive namespace from numpy and matplotlib
[<matplotlib.lines.Line2D at 0x7f729cffdd50>,
<matplotlib.lines.Line2D at 0x7f729cffded0>]

In addition, it can also display 2-D data via its methods contour, contourf and pcolor mapped from matplotlib.
>>> # create some data
>>> lon = np.linspace(-180, 180, 10)
>>> lat = np.linspace(-90, 90, 10)
>>> LON, LAT = np.meshgrid(lon, lat)
>>> DATA = np.cos(np.radians(LON)) + np.cos(np.radians(LAT))
>>> # define dimarray
>>> a = DimArray(DATA, axes=[lat, lon], dims=['lat','lon'])
>>> # plot the data
>>> c = a.contourf()
>>> colorbar(c) # explicit colorbar creation # doctest: +SKIP
>>> a.contour(colors='k') # doctest: +SKIP
/home/perrette/glacierenv/local/lib/python2.7/site-packages/matplotlib/collections.py:650: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors_original != str('face'):
<matplotlib.contour.QuadContourSet instance at 0x7f729ce46b00>/home/perrette/glacierenv/local/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors == str('face'):

>>> # plot the data
>>> a.pcolor(colorbar=True) # colorbar as keyword argument # doctest: +SKIP
<matplotlib.collections.QuadMesh at 0x7f729cd3aa10>

For more information, you can use inline help (help() or ?) or refer to Advanced topics and Reference API
Advanced topics¶
Under construction…
Create a dimarray¶
There are various ways of defining a DimArray instance.
Standard definition¶
Provide a list of axis values (axes= parameter) and a list of axis names (dims=) parameter.
>>> from dimarray import DimArray
>>> a = DimArray([[1.,2,3], [4,5,6]], axes=[['a', 'b'], [1950, 1960, 1970]], dims=['variable', 'time'])
>>> a
dimarray: 6 non-null elements (0 null)
0 / variable (2): 'a' to 'b'
1 / time (3): 1950 to 1970
array([[1., 2., 3.],
[4., 5., 6.]])
List of tuples¶
DimArray axes can also be initialized via a list of tuples (axis name, axis values):
>>> a = DimArray([[1.,2,3], [4,5,6]], axes=[('variable', ['a', 'b']), ('time', [1950, 1960, 1970])])
>>> a
dimarray: 6 non-null elements (0 null)
0 / variable (2): 'a' to 'b'
1 / time (3): 1950 to 1970
array([[1., 2., 3.],
[4., 5., 6.]])
Recursive definition : dict of dict¶
New in version 0.1.8.
It is possible to define a dimarray as a dictionary of dictionary. The only additional parameter needed is a list of dimension names, that should correspond to the dictionary’s depth.
>>> dict_ = {'a': {1:11,
... 2:22,
... 3:33},
... 'b': {1:111,
... 2:222,
... 3:333} }
>>> a = DimArray(dict_, dims=['dim1','dim2'])
>>> a.sort_axis(axis=0).sort_axis(axis=1) # dict keys are not sorted in python !
dimarray: 6 non-null elements (0 null)
0 / dim1 (2): 'a' to 'b'
1 / dim2 (3): 1 to 3
array([[ 11, 22, 33],
[111, 222, 333]])
Advanced Indexing¶
Let’s first define an array to test indexing
>>> from dimarray import DimArray
>>> v = DimArray([[1,2],[3,4],[5,6],[7,8]], axes=[["a","b","c","d"], [10.,20.]], dims=['x0','x1'], dtype=float)
>>> v
dimarray: 8 non-null elements (0 null)
0 / x0 (4): 'a' to 'd'
1 / x1 (2): 10.0 to 20.0
array([[1., 2.],
[3., 4.],
[5., 6.],
[7., 8.]])
Basics: integer, array, slice¶
There are various ways of indexing a DimArray, and all follow numpy’s rules, except that in the default behaviour indices refer to axis values and not to position on the axis, in contrast to numpy.
>>> v['a',20] # extract a single item
2.0
The ix attrubutes is the pendant for position (integer) indexing (and exclusively so !). It is therefore similar to indexing on the values attribute, except that it returns a new DimArray, where v.values[…] would return a numpy ndarray.
>>> v.ix[0,:]
dimarray: 2 non-null elements (0 null)
0 / x1 (2): 10.0 to 20.0
array([1., 2.])
Note that the last element of slices is INCLUDED, contrary to numpy’s position indexing. Step argument is always intrepreted as an integer.
>>> v['a':'c',10] # 'c' is INCLUDED
dimarray: 3 non-null elements (0 null)
0 / x0 (3): 'a' to 'c'
array([1., 3., 5.])
>>> v[['a','c'],10] # it is possible to provide a list
dimarray: 2 non-null elements (0 null)
0 / x0 (2): 'a' to 'c'
array([1., 5.])
>>> v[v.x0 != 'b',10] # boolean indexing is also fine
dimarray: 3 non-null elements (0 null)
0 / x0 (3): 'a' to 'd'
array([1., 5., 7.])
If several array-like indices are provided, “orthogonal” indexing is performed, along each dimension independently:
>>> v[['a','c'],[10,20]] # it is possible to provide a list
dimarray: 4 non-null elements (0 null)
0 / x0 (2): 'a' to 'c'
1 / x1 (2): 10.0 to 20.0
array([[1., 2.],
[5., 6.]])
See below for the cases where you do need numpy-like index broadcasting, using the take method.
Modify array values¶
All the above can be used to change array values, consistently with what you would expect.
>>> v['a':'c',10] = 11
>>> v.ix[2, -1] = 22 # same as v.values[2, -1] = 44
>>> v[v == 2] = 33
>>> v[v.x0 == 'b', v.x1 == 20] = 44
>>> v
dimarray: 8 non-null elements (0 null)
0 / x0 (4): 'a' to 'd'
1 / x1 (2): 10.0 to 20.0
array([[11., 33.],
[11., 44.],
[11., 22.],
[ 7., 8.]])
take and put methods¶
These two methods dimarray.DimArray.put()
and dimarray.DimArray.take()
are the machinery to accessing and modifying items in the examples above.
They may be useful to use directly for generic programming.
They are similar to numpy methods of the same name, but also work in multiple dimensions.
In particular, they both take dictionary, tuples and boolean arrays as indices argument.
>>> v = DimArray([[1,2],[3,4],[5,6],[7,8]], labels=[["a","b","c","d"], [10.,20.]], dims=['x0','x1'], dtype=float)
>>> import numpy as np
>>> v[:,10] # doctest: +SKIP
>>> v.take(10, axis=1) # doctest: +SKIP
>>> v.take(10, axis='x1') # doctest: +SKIP
>>> v.take({'x1':10}) # dict # doctest: +SKIP
>>> v.take((slice(None),10)) # tuple # doctest: +SKIP
dimarray: 4 non-null elements (0 null)
0 / x0 (4): 'a' to 'd'
array([1., 3., 5., 7.])
The two latter forms, tuple or dict, allow performing multi-indexing. Array broadcasting is controlled by “broadcast” parameter.
>>> v.take({'x0':['a','b'], 'x1':[10, 20]}, broadcast=True)
dimarray: 2 non-null elements (0 null)
0 / x0,x1 (2): ('a', '10.0') to ('b', '20.0')
array([1., 4.])
>>> v.take({'x0':['a','b'], 'x1':[10, 20]}, broadcast=False) # same as v.box[['a','b'],[10, 20]]
dimarray: 4 non-null elements (0 null)
0 / x0 (2): 'a' to 'b'
1 / x1 (2): 10.0 to 20.0
array([[1., 2.],
[3., 4.]])
The ‘indexing’ parameter can be set to position (same as ix) instead of values
>>> v.take(0, axis=1, indexing='position')
dimarray: 4 non-null elements (0 null)
0 / x0 (4): 'a' to 'd'
array([1., 3., 5., 7.])
Note the put command modifies values in-place by default, unless inplace=False.
>>> v.put(indices=10, values=-99, axis='x1', inplace=False)
dimarray: 8 non-null elements (0 null)
0 / x0 (4): 'a' to 'd'
1 / x1 (2): 10.0 to 20.0
array([[-99., 2.],
[-99., 4.],
[-99., 6.],
[-99., 8.]])
Along-axis transformations¶
Basics¶
Most numpy transformations are built in. Let’s create some data to try it out:
>>> from dimarray import DimArray
>>> a = DimArray([[1,2,3],[4,5,6]], axes=[['a','b'], [2000,2001,2002]], dims=['time', 'items'])
>>> a
dimarray: 6 non-null elements (0 null)
0 / time (2): 'a' to 'b'
1 / items (3): 2000 to 2002
array([[1, 2, 3],
[4, 5, 6]])
The classical numpy syntax names are used (sum, mean, max…) are used. For transformation that reduce an axis, the default behaviour is to flatten the array prior to the transformation, consistently with numpy:
>>> a.mean() # sum over all axes
3.5
But the axis= parameter can also be passed explicitly to reduce only a specific axis:
>>> a.mean(axis=0) # sum over first axis
dimarray: 3 non-null elements (0 null)
0 / items (3): 2000 to 2002
array([2.5, 3.5, 4.5])
but it is now also possible to indicate axis name:
>>> a.mean(axis='time') # named axis
dimarray: 3 non-null elements (0 null)
0 / items (3): 2000 to 2002
array([2.5, 3.5, 4.5])
In addition, one can now provide a tuple to the axis= parameter, to reduce several axes at once
>>> a.mean(axis=('time','items')) # named axis
3.5
Of course, the above example makes more sense when they are more than two axes. To perform an operation on the flatten array, the convention is to provide the None value for axis=, which is the default behaviour in all reduction operators.
Note
transformations that accumulate along an axis (cumsum, cumprod) default on the last axis (axis=-1) instead of flattening the array. This is also true of the diff operator.
While most methods directly call numpy’s in most cases, some subtle differences may exist that have to do with the need to define an axis values consistent with the operation. For example the diff
method proposes several values for a scheme parameter (“centered”, “backward”, “forward”). Of interest also, the argmin
and argmax
methods return the value of the axis at the extrema instead of the integer position:
>>> a.argmin()
('a', 2000)
…which is consistent with dimarray indexing:
>>> a[a.argmin()]
1
Missing values¶
dimarray treats NaN as missing values, which can be skipped in transformations by passing skipna=True. In the example below we use a float-typed array because there is no NaN type in integer arrays.
>>> import numpy as np
>>> a = DimArray([[1,2,3],[4,5,6]], dtype=float)
>>> a[1,2] = np.nan
>>> a
dimarray: 5 non-null elements (1 null)
0 / x0 (2): 0 to 1
1 / x1 (3): 0 to 2
array([[ 1., 2., 3.],
[ 4., 5., nan]])
>>> a.sum(axis=0) # here the nans are not skipped
dimarray: 2 non-null elements (1 null)
0 / x1 (3): 0 to 2
array([ 5., 7., nan])
>>> a.sum(axis=0, skipna=True)
dimarray: 3 non-null elements (0 null)
0 / x1 (3): 0 to 2
array([5., 7., 3.])
Metadata¶
DimArray
, Dataset
and Axis
all support metadata. The straightforward way to define them is via the standard . syntax to access an object attribute:
>>> from dimarray import DimArray
>>> a = DimArray([1,2,3])
>>> a.name = 'distance'
>>> a.units = 'meters'
Although they are nothing more than usual python attributes, the _metadata()
method gives an overview of all metadata:
>>> a.attrs
OrderedDict([('name', 'distance'), ('units', 'meters')])
Metadata are conserved by slicing and along-axis transformation, but are lost with more ambiguous operations.
>>> a[:].attrs
OrderedDict([('name', 'distance'), ('units', 'meters')])
>>> (a**2).attrs
OrderedDict()
Warning
The attrs attribute has been added in version 0.2, thereby deprecating the former _metadata.
A summary()
method is also defined that provide an overview of both the data and its metadata.
>>> a.axes[0].units = 'axis units'
>>> a.summary()
dimarray: 3 non-null elements (0 null)
0 / x0 (3): 0 to 2
units: 'axis units'
attributes:
name: 'distance'
units: 'meters'
array([1, 2, 3])
Note
Metadata that start with an underscore _ or use any protected class attribute as name (e.g. values, axes, dims and so on) can be set and accessed using by manipulating attrs
.
>>> a.attrs['dims'] = 'this is a bad name'
>>> a.attrs
OrderedDict([('name', 'distance'), ('units', 'meters'), ('dims', 'this is a bad name')])
>>> a.dims
('x0',)
It is easy to clear metadata:
>>> a.attrs = {} # clean all metadata
NetCDF reading and writing¶
Read from one netCDF file¶
>>> from dimarray import read_nc, get_datadir
>>> import os
>>> ncfile = os.path.join(get_datadir(), 'cmip5.CSIRO-Mk3-6-0.nc') # get one netCDF file
>>> data = read_nc(ncfile) # load full file
>>> data
Dataset of 2 variables
0 / time (451): 1850 to 2300
1 / scenario (5): 'historical' to 'rcp85'
tsl: ('time', 'scenario')
temp: ('time', 'scenario')
Then access the variable of choice
>>> %pylab # doctest: +SKIP
>>> %matplotlib inline # doctest: +SKIP
>>> _ = data['temp'].plot()
>>> _ = plt.legend(loc='upper left') # doctest: +SKIP
Using matplotlib backend: TkAgg
Populating the interactive namespace from numpy and matplotlib

Load only one variable
>>> data = read_nc(ncfile,'temp') # only one variable
>>> data = read_nc(ncfile,'temp', indices={"time":slice(2000,2100), "scenario":"rcp45"}) # load only a chunck of the data
>>> data = read_nc(ncfile,'temp', indices={"time":1950.3}, tol=0.5) # approximate matching, adjust tolerance
>>> data = read_nc(ncfile,'temp', indices={"time":-1}, indexing='position') # integer position indexing
Read from multiple files¶
Read variable ‘temp’ across multiple files (representing various climate models). In this case the variable is a time series, whose length may vary across experiments (thus align=True is passed to reindex axes before stacking). Under the hood the function py:func:dimarray.stack is called:
>>> direc = get_datadir()
>>> temp = read_nc(direc+'/cmip5.*.nc', 'temp', align=True, axis='model')
A new ‘model’ axis is created labeled with file names. It is then possible to rename it more appropriately, e.g. keeping only the part directly relevant to identify the experiment:
>>> getmodel = lambda x: os.path.basename(x).split('.')[1] # extract model name from path
>>> temp.set_axis(getmodel, axis='model', inplace=True) # would return a copy if inplace is not specified
>>> temp
dimarray: 9114 non-null elements (6671 null)
0 / model (7): 'CSIRO-Mk3-6-0' to 'MPI-ESM-MR'
1 / time (451): 1850 to 2300
2 / scenario (5): 'historical' to 'rcp85'
array(...)
This works on datasets as well
>>> ds = read_nc(direc+'/cmip5.*.nc', align=True, axis='model')
>>> ds.set_axis(getmodel, axis='model', inplace=True)
>>> ds
Dataset of 2 variables
0 / model (7): 'CSIRO-Mk3-6-0' to 'MPI-ESM-MR'
1 / time (451): 1850 to 2300
2 / scenario (5): 'historical' to 'rcp85'
tsl: ('model', 'time', 'scenario')
temp: ('model', 'time', 'scenario')
Write to netCDF¶
Let’s define some dummy arrays representing temperature in northern and southern hemisphere for three years.
>>> from dimarray import DimArray
>>> temperature = DimArray([[1.,2,3], [4,5,6]], axes=[['north','south'], [1951, 1952, 1953]], dims=['lat', 'time'])
>>> global_mean = temperature.mean(axis='lat')
>>> climatology = temperature.mean(axis='time')
Let’s define a new dataset
>>> from dimarray import Dataset
>>> ds = Dataset(temperature=temperature, global_mean=global_mean)
>>> ds
Dataset of 2 variables
0 / time (3): 1951 to 1953
1 / lat (2): 'north' to 'south'
global_mean: ('time',)
temperature: ('lat', 'time')
Saving the dataset to file is pretty simple:
>>> ds.write_nc('/tmp/test.nc', mode='w')
It is possible to append more variables
>>> climatology.write_nc('/tmp/test.nc', 'climatology', mode='a') # by default mode='w'
Just as a check, all three variables seem to be there:
>>> read_nc('/tmp/test.nc')
Dataset of 3 variables
0 / time (3): 1951 to 1953
1 / lat (2): 'north' to 'south'
global_mean: ('time',)
temperature: ('lat', 'time')
climatology: ('lat',)
Note that when appending a variable to a netCDF file or to a dataset, its axes must match, otherwise an error will be raised. In that case it may be necessary to reindex an axis (see Reindexing: align axes). When initializing a dataset with bunch of dimarray however, reindexing is performed automatically.
New NetCDF4 storage¶
New in version 0.2.
Since version 0.2, the methods above are a wrapper around :class:dimarray.DatasetOnDisk class, which allows lower level access with a DimArray feeling.
>>> import dimarray as da
>>> import numpy as np
>>> dima = da.DimArray([[1,2,3],[4,5,6]], axes=[('time',[2000,2045.5]),('scenario',['a','b','c'])])
>>> dima.units = 'myunits' # metadata
>>> dima.axes['time'].units = 'metadata-dim-in-memory'
>>>
>>> ds = da.open_nc('/tmp/test.nc', mode='w')
>>> ds['myvar'] = dima
>>> ds['myvar'].bla = 'bla'
>>> ds['myvar'].axes['time'].yo = 'metadata-dim-on-disk'
>>> ds.axes['scenario'].ya = 'metadata-var-on-disk'
>>> ds.yi = 'metadata-dataset-on-disk'
>>> ds.close()
Let’s check the result:
>>> ds2 = da.open_nc("/tmp/test.nc", mode="a")
>>> ds2
DatasetOnDisk of 1 variable (NETCDF4)
0 / time (2): 2000.0 to 2045.5
1 / scenario (3): 'a' to 'c'
myvar: ('time', 'scenario')
>>> ds2.summary()
DatasetOnDisk of 1 variable (NETCDF4)
<BLANKLINE>
//dimensions:
0 / time (2): 2000.0 to 2045.5
units: 'metadata-dim-in-memory'
yo: 'metadata-dim-on-disk'
1 / scenario (3): 'a' to 'c'
ya: 'metadata-var-on-disk'
<BLANKLINE>
//variables:
myvar: ('time', 'scenario')
units: 'myunits'
bla: 'bla'
<BLANKLINE>
//global attributes:
yi: 'metadata-dataset-on-disk'
>>> ds2['myvar']
DimArrayOnDisk: 'myvar' (6)
0 / time (2): 2000.0 to 2045.5
1 / scenario (3): 'a' to 'c'
>>> ds2['myvar'].values # doctest: +SKIP
<type 'netCDF4._netCDF4.Variable'>
int64 myvar(time, scenario)
units: myunits
bla: bla
unlimited dimensions:
current shape = (2, 3)
filling on, default _FillValue of -9223372036854775806 used
>>> ds2['myvar'][:]
dimarray: 6 non-null elements (0 null)
0 / time (2): 2000.0 to 2045.5
1 / scenario (3): 'a' to 'c'
array([[1, 2, 3],
[4, 5, 6]])
>>> ds2['myvar'][2000, 'b'] = 77
>>> ds2['myvar'][:]
dimarray: 6 non-null elements (0 null)
0 / time (2): 2000.0 to 2045.5
1 / scenario (3): 'a' to 'c'
array([[ 1, 77, 3],
[ 4, 5, 6]])
>>> ds2['myvar'].ix[0, -1] = -1
>>> ds2['myvar'][:]
dimarray: 6 non-null elements (0 null)
0 / time (2): 2000.0 to 2045.5
1 / scenario (3): 'a' to 'c'
array([[ 1, 77, -1],
[ 4, 5, 6]])
>>> ds2.close()
Create a variable with unlimited dimension¶
>>> import dimarray as da
>>>
>>> ds = da.open_nc('/tmp/test.nc', 'w')
>>> ds.axes.append('time', None)
>>> ds.nc.dimensions['time'] # underlying netCDF4 object # doctest: +SKIP
<type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
Fill-up the variable:
>>> ds['bla'] = da.DimArray([1,2,3,4,5], dims=['time'], axes=[list('abcde')])
>>> ds.nc.dimensions['time'] # underlying netCDF4 object # doctest: +SKIP
<type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5
Append some new slices:
>>> ds['bla'].ix[5] = da.DimArray([66], dims=['time'], axes=[['f']])
>>> ds.nc.dimensions['time'] # underlying netCDF4 object # doctest: +SKIP
<type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 6
>>> ds['bla'].read()
dimarray: 6 non-null elements (0 null)
0 / time (6): 'a' to 'f'
array([ 1, 2, 3, 4, 5, 66])
>>> ds.close()
Cookbook¶
Create a generic time-mean function¶
This function applies to any kind of input array, as long as the “time” dimension is present.
>>> def time_mean(a, t1=None, t2=None):
... """ compute time mean between two instants
...
... Parameters:
... -----------
... a : DimArray
... must include a "time" dimension
... t1, t2 : same type as a.time (typically int or float)
... start and end times
...
... Returns:
... --------
... ma : DimArray
... time-average between t1 and t2
... """
... assert 'time' in a.dims, 'dimarray must have the "time" dimension'
... return a.swapaxes(0, 'time')[t1:t2].mean(axis='time')
>>> from dimarray import DimArray
>>> import numpy as np
>>> a = DimArray([1,2,3,4], axes=[[2000,2001,2002,2003]], dims=['time'])
>>> time_mean(a, 2001, 2003) # average over 2001, 2002, 2003
3.0
>>> a = DimArray([[1,2,3,4],[5,6,7,8]], axes=[['a','b'],[2000,2001,2002,2003]], dims=['items','time'])
>>> time_mean(a) # average over the full time axis
dimarray: 2 non-null elements (0 null)
0 / items (2): 'a' to 'b'
array([2.5, 6.5])
Talks¶
Selected talks involving dimarray:
- Modelling strategy seminar at PIK (May 2014): Brief introduction to python, numpy and dimarray
Reference API¶
Under construction…
Classical functions have been organized in categories. For a reference documentation please see inline help and next section.
>>> help(DimArray.diff) # doctest: +SKIP
or with ipyhton
>>> DimArray.diff? # doctest: +SKIP
DimArray API¶
DimArray methods are list below by topic, along with examples. Functions are provided in a separate page functions reference API.
Create a DimArray¶
-
DimArray.
__init__
(values=None, axes=None, dims=None, labels=None, copy=False, dtype=None, _indexing=None, _indexing_broadcast=None, **kwargs)[source]¶ Initialize a DimArray instance
Parameters: - values : numpy-like array, or DimArray instance, or dict
If values is not provided, will initialize an empty array with dimensions inferred from axes (in that case axes= must be provided).
- axes : list or tuple, optional
axis values as ndarrays, whose order matches axis names (the dimensions) provided via dims= parameter. Each axis can also be provided as a tuple (str, array-like) which contains both axis name and axis values, in which case dims= becomes superfluous. axes= can also be provided with a list of Axis objects If axes= is omitted, a standard axis np.arange(shape[i]) is created for each axis i.
- dims : list or tuple, optional
dimensions (or axis names) This parameter can be omitted if dimensions are already provided by other means, such as passing a list of tuple to axes=. If axes are passed as keyword arguments (via **kwargs), dims= is used to determine the order of dimensions. If dims is not provided by any of the means mentioned above, default dimension names are given x0, x1, …`xn`, where n is the number of dimensions.
- dtype : numpy data type, optional
passed to np.array()
- copy : bool, optional
passed to np.array()
- **kwargs : keyword arguments
metadata
Notes
- metadata passed this way cannot have name already taken by other
- parameters such as “values”, “axes”, “dims”, “dtype” or “copy”.
Examples
Basic:
>>> DimArray([[1,2,3],[4,5,6]]) # automatic labelling dimarray: 6 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (3): 0 to 2 array([[1, 2, 3], [4, 5, 6]])
>>> DimArray([[1,2,3],[4,5,6]], dims=['items','time']) # axis names only dimarray: 6 non-null elements (0 null) 0 / items (2): 0 to 1 1 / time (3): 0 to 2 array([[1, 2, 3], [4, 5, 6]])
>>> DimArray([[1,2,3],[4,5,6]], axes=[list("ab"), np.arange(1950,1953)]) # axis values only dimarray: 6 non-null elements (0 null) 0 / x0 (2): 'a' to 'b' 1 / x1 (3): 1950 to 1952 array([[1, 2, 3], [4, 5, 6]])
More general case:
>>> a = DimArray([[1,2,3],[4,5,6]], axes=[list("ab"), np.arange(1950,1953)], dims=['items','time']) >>> b = DimArray([[1,2,3],[4,5,6]], axes=[('items',list("ab")), ('time',np.arange(1950,1953))]) >>> c = DimArray([[1,2,3],[4,5,6]], {'items':list("ab"), 'time':np.arange(1950,1953)}) # here dims can be omitted because shape = (2, 3) >>> np.all(a == b) and np.all(a == c) True >>> a dimarray: 6 non-null elements (0 null) 0 / items (2): 'a' to 'b' 1 / time (3): 1950 to 1952 array([[1, 2, 3], [4, 5, 6]])
Empty data
>>> a = DimArray(axes=[('items',list("ab")), ('time',np.arange(1950,1953))])
Metadata
>>> a = DimArray([[1,2,3],[4,5,6]], name='test', units='none')
Modify shape¶
-
DimArray.
transpose
(*dims)¶ Permute dimensions
Analogous to numpy, but also allows axis names
Parameters: - *dims : int or str
variable list of dimensions
Returns: - transposed_array : DimArray
Examples
>>> import dimarray as da >>> a = da.DimArray(np.zeros((2,3)), ['x0','x1']) >>> a dimarray: 6 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (3): 0 to 2 array([[0., 0., 0.], [0., 0., 0.]]) >>> a.T dimarray: 6 non-null elements (0 null) 0 / x1 (3): 0 to 2 1 / x0 (2): 0 to 1 array([[0., 0.], [0., 0.], [0., 0.]]) >>> (a.T == a.transpose(1,0)).all() and (a.T == a.transpose('x1','x0')).all() True
-
DimArray.
swapaxes
(axis1, axis2)¶ Swap two axes
analogous to numpy’s swapaxes, but can provide axes by name
Parameters: - axis1, axis2 : int or str
axes to swap (transpose)
Returns: - transposed_array : DimArray
Examples
>>> from dimarray import DimArray >>> a = DimArray(np.arange(2*3*4).reshape(2,3,4)) >>> a.dims ('x0', 'x1', 'x2') >>> b = a.swapaxes('x2',0) # put 'x2' at the first position >>> b.dims ('x2', 'x1', 'x0') >>> b.shape (4, 3, 2)
-
DimArray.
reshape
(*newdims, **kwargs)¶ Add/remove/flatten dimensions to conform array to new dimensions
Parameters: - newdims : tuple or list or variable list of dimension names {str}
Any dimension now present in the array is added as singleton dimension Any dimension name containing a comma is interpreting as a flattening command. All dimensions to flatten have to exist already.
- transpose : bool
if True, transpose dimensions to match new order (default True) otherwise, raise and Error if tranpose is needed (closer to original numpy’s behaviour)
Returns: - reshaped_array : DimArray
with reshaped_array.dims == tuple(newdims)
Examples
>>> from dimarray import DimArray >>> a = DimArray([7,8]) >>> a dimarray: 2 non-null elements (0 null) 0 / x0 (2): 0 to 1 array([7, 8])
>>> a.reshape(('x0','new')) dimarray: 2 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / new (1): None to None array([[7], [8]])
>>> b = DimArray(np.arange(2*2*2).reshape(2,2,2)) >>> b dimarray: 8 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (2): 0 to 1 2 / x2 (2): 0 to 1 array([[[0, 1], [2, 3]], <BLANKLINE> [[4, 5], [6, 7]]])
>>> c = b.reshape('x0','x1,x2') >>> c dimarray: 8 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1,x2 (4): (0, 0) to (1, 1) array([[0, 1, 2, 3], [4, 5, 6, 7]])
>>> c.reshape('x0,x1','x2') dimarray: 8 non-null elements (0 null) 0 / x0,x1 (4): (0, 0) to (1, 1) 1 / x2 (2): 0 to 1 array([[0, 1], [2, 3], [4, 5], [6, 7]])
-
DimArray.
flatten
(*dims, **kwargs)¶ Flatten all or a subset of dimensions
Parameters: - dims : list or tuple of axis names, optional
by default, all dimensions
- reverse : bool, optional
if True, reverse behaviour: dims are interpreted as the dimensions to keep, and all the other dimensions are flattened default is False
- insert : int, optional
position where to insert the flattened axis (by default, any flattened dimension is inserted at the position of the first axis involved in flattening)
Returns: - flattened_array : DimArray
appropriately reshaped, with collapsed dimensions as first axis (tuples)
- This is useful to do a regional mean with missing values
Notes
A tuple of axis names can be passed via the “axis” parameter of the transformation to trigger flattening prior to reducing an axis.
Examples
Flatten all dimensions
>>> from dimarray import DimArray >>> a = DimArray([[1,2,3],[4,5,6]]) >>> a dimarray: 6 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (3): 0 to 2 array([[1, 2, 3], [4, 5, 6]])
>>> b = a.flatten() >>> b dimarray: 6 non-null elements (0 null) 0 / x0,x1 (6): (0, 0) to (1, 2) array([1, 2, 3, 4, 5, 6])
>>> b.labels (array([(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)], dtype=object),)
Flatten a subset of dimensions only
>>> from dimarray import DimArray >>> np.random.seed(0) >>> values = np.arange(2*3*4).reshape(2,3,4) >>> v = DimArray(values, axes=[('time', [1950,1955]), ('lat', np.linspace(-90,90,3)), ('lon', np.linspace(-180,180,4))]) >>> v dimarray: 24 non-null elements (0 null) 0 / time (2): 1950 to 1955 1 / lat (3): -90.0 to 90.0 2 / lon (4): -180.0 to 180.0 array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], <BLANKLINE> [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])
>>> w = v.flatten(('lat','lon'), insert=1) >>> w dimarray: 24 non-null elements (0 null) 0 / time (2): 1950 to 1955 1 / lat,lon (12): (-90.0, -180.0) to (90.0, 180.0) array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
>>> np.all( w.unflatten() == v ) True
But be careful, the order matter !
>>> v.flatten(('lon','lat'), insert=1) dimarray: 24 non-null elements (0 null) 0 / time (2): 1950 to 1955 1 / lon,lat (12): (-180.0, -90.0) to (180.0, 90.0) array([[ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11], [12, 16, 20, 13, 17, 21, 14, 18, 22, 15, 19, 23]])
Useful to average over a group of dimensions:
>>> v.flatten(('lon','lat'), insert=0).mean(axis=0) dimarray: 2 non-null elements (0 null) 0 / time (2): 1950 to 1955 array([ 5.5, 17.5])
is equivalent to:
>>> v.mean(axis=('lon','lat')) dimarray: 2 non-null elements (0 null) 0 / time (2): 1950 to 1955 array([ 5.5, 17.5])
-
DimArray.
unflatten
(axis=None)¶ undo flatten (inflate array)
Parameters: - axis : int or str or None, optional
axis to unflatten default to None to unflatten all
Returns: - DimArray
-
DimArray.
squeeze
(axis=None)¶ Squeeze singleton axes
Analogous to numpy, but also allows axis name
Parameters: - axis : int or str or None
axis to squeeze default is None, to remove all singleton axes
Returns: - squeezed_array : DimArray
Examples
>>> import dimarray as da >>> a = da.DimArray([[[1,2,3]]]) >>> a dimarray: 3 non-null elements (0 null) 0 / x0 (1): 0 to 0 1 / x1 (1): 0 to 0 2 / x2 (3): 0 to 2 array([[[1, 2, 3]]]) >>> a.squeeze() dimarray: 3 non-null elements (0 null) 0 / x2 (3): 0 to 2 array([1, 2, 3]) >>> a.squeeze(axis='x1') dimarray: 3 non-null elements (0 null) 0 / x0 (1): 0 to 0 1 / x2 (3): 0 to 2 array([[1, 2, 3]])
-
DimArray.
repeat
(values, axis=None)¶ expand the array along an existing axis
Parameters: - values : int or ndarray or Axis instance
int: size of new axis ndarray: values of new axis
- axis : int or str
refer to the dimension along which to repeat
- **kwaxes : key-word arguments
alternatively, axes may be passed as keyword arguments
Returns: - DimArray
See also
newaxis
Examples
>>> import dimarray as da >>> a = da.DimArray(np.arange(3), labels = [[1950., 1951., 1952.]], dims=('time',)) >>> a2d = a.newaxis('lon', pos=1) # lon is now singleton dimension
>>> a2d.repeat(2, axis="lon") dimarray: 6 non-null elements (0 null) 0 / time (3): 1950.0 to 1952.0 1 / lon (2): 0 to 1 array([[0, 0], [1, 1], [2, 2]])
>>> a2d.repeat([30., 50.], axis="lon") dimarray: 6 non-null elements (0 null) 0 / time (3): 1950.0 to 1952.0 1 / lon (2): 30.0 to 50.0 array([[0, 0], [1, 1], [2, 2]])
-
DimArray.
broadcast
(other)¶ repeat array to match target dimensions
Parameters: - other : DimArray or Axes objects or ordered Dictionary of axis values
Returns: - DimArray
Examples
Create some dummy data: # …create some dummy data:
>>> import dimarray as da >>> lon = np.linspace(10, 30, 2) >>> lat = np.linspace(10, 50, 3) >>> time = np.arange(1950,1955) >>> ts = da.DimArray(np.arange(5), axes=[time], dims=['time']) >>> cube = da.DimArray(np.zeros((3,2,5)), axes=[('lat',lat), ('lon',lon), ('time',time)]) # lat x lon x time >>> cube.axes 0 / lat (3): 10.0 to 50.0 1 / lon (2): 10.0 to 30.0 2 / time (5): 1950 to 1954
# …broadcast timeseries to 3D data
>>> ts3D = ts.broadcast(cube) # lat x lon x time >>> ts3D dimarray: 30 non-null elements (0 null) 0 / lat (3): 10.0 to 50.0 1 / lon (2): 10.0 to 30.0 2 / time (5): 1950 to 1954 array([[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]], <BLANKLINE> [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]], <BLANKLINE> [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]])
Reduce, accumulate¶
-
DimArray.
max
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s max
max(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.max or numpy.ma.max for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
min
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s min
min(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.min or numpy.ma.min for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
ptp
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s ptp
ptp(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.ptp or numpy.ma.ptp for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
median
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s median
median(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.median or numpy.ma.median for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
all
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s all
all(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.all or numpy.ma.all for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
any
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s any
any(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.any or numpy.ma.any for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
prod
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s prod
prod(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.prod or numpy.ma.prod for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
sum
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s sum
sum(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.sum or numpy.ma.sum for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
mean
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s mean
mean(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.mean or numpy.ma.mean for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
std
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s std
std(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.std or numpy.ma.std for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
var
(skipna=False, args=(), **kwargs)¶ Analogous to numpy’s var
var(…, axis=None, skipna=False, …)
Accepts the same parameters as the equivalent numpy function, with modified behaviour of the axis parameter and an additional skipna parameter to handle NaNs (by default considered missing values)
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
- “…” stands for any other parameters required by the function, and depends
- on the particular function being called
Returns: - DimArray, or numpy array or scalar (e.g. in some cases if `axis` is None)
- See help on numpy.var or numpy.ma.var for other parameters
- and more information.
See also
apply_along_axis
- is called by this method
to_MaskedArray
- is used if skipna is True
-
DimArray.
argmax
(axis=None, skipna=False)¶ similar to numpy’s argmax, but return axis values instead of integer position
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
-
DimArray.
argmin
(axis=None, skipna=False)¶ similar to numpy’s argmin, but return axis values instead of integer position
Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is None.
- skipna : bool
- If True, treat NaN as missing values (either using MaskedArray or,
when available, specific numpy function)
-
DimArray.
cumsum
(axis=-1, skipna=False)¶
-
DimArray.
cumprod
(axis=-1, skipna=False)¶
-
DimArray.
diff
(axis=-1, scheme='backward', keepaxis=False, n=1)¶ Analogous to numpy’s diff
Calculate the n-th order discrete difference along given axis.
The first order difference is given by
out[n] = a[n+1] - a[n]
along the given axis, higher order differences are calculated by using diff recursively.Parameters: - axis : int or str or tuple
axis along which to apply the tranform. Can be given as axis position (int), as axis name (str), as a list or tuple of axes (positions or names) to collapse into one axis before applying transform. If axis is None, just apply the transform on the flattened array consistently with numpy (in this case will return a scalar). Default is -1.
- scheme : str, optional
determines the values of the resulting axis - “forward” : diff[i] = x[i+1] - x[i] - “backward”: diff[i] = x[i] - x[i-1] - “centered”: diff[i] = x[i+1/2] - x[i-1/2] Default is “backward”
- keepaxis : bool, optional
if True, keep the initial axis by padding with NaNs Only compatible with “forward” or “backward” differences Default is False
- n : int, optional
The number of times values are differenced. Default is one
Returns: - diff : DimArray
The n order differences. The shape of the output is the same as a except along axis where the dimension is smaller by n.
Examples
Create some example data
>>> import dimarray as da >>> v = da.DimArray([1,2,3,4], ('time', np.arange(1950,1954)), dtype=float) >>> s = v.cumsum() >>> s dimarray: 4 non-null elements (0 null) 0 / time (4): 1950 to 1953 array([ 1., 3., 6., 10.])
diff reduces axis size by one, by default
>>> s.diff() dimarray: 3 non-null elements (0 null) 0 / time (3): 1951 to 1953 array([2., 3., 4.])
The keepaxis= parameter fills array with nan where necessary to keep the axis unchanged. Default is backward differencing: diff[i] = v[i] - v[i-1].
>>> s.diff(keepaxis=True) dimarray: 3 non-null elements (1 null) 0 / time (4): 1950 to 1953 array([nan, 2., 3., 4.])
But other schemes are available to control how the new axis is defined: backward (default), forward and even centered
>>> s.diff(keepaxis=True, scheme="forward") # diff[i] = v[i+1] - v[i] dimarray: 3 non-null elements (1 null) 0 / time (4): 1950 to 1953 array([ 2., 3., 4., nan])
The keepaxis=True option is invalid with the centered scheme, since every axis value is modified by definition:
>>> s.diff(axis='time', scheme='centered') dimarray: 3 non-null elements (0 null) 0 / time (3): 1950.5 to 1952.5 array([2., 3., 4.])
Indexing¶
-
DimArray.
__getitem__
(indices=None, axis=0, indexing=None, tol=None, broadcast=None, keepdims=False, broadcast_arrays=None)¶
-
DimArray.
ix
()¶
-
DimArray.
box
()¶ property to allow indexing without array broadcasting (matlab-like)
-
DimArray.
take
()¶ Retrieve values from a DimArray
Parameters: - indices : int or list or slice (single-dimensional indices)
or a tuple of those (multi-dimensional) or dict of { axis name : axis values }
- axis : None or int or str, optional
if specified and indices is a slice, scalar or an array, assumes indexing is along this axis.
- indexing : {‘label’, ‘position’}, optional
Indexing mode. - “label”: indexing on axis labels (default) - “position”: use numpy-like position index Default value can be changed in dimarray.rcParams[‘indexing.by’]
- tol : None or float or tuple or dict, optional
tolerance when looking for numerical values, e.g. to use nearest neighbor search, default None.
- keepdims : bool, optional
keep singleton dimensions (default False)
- broadcast : bool, optional
if True, use numpy-like fancy indexing and broadcast any indexing array to a common shape, useful for example to sample points along a path. Default to False.
Returns: - indexed_array : DimArray instance or scalar
See also
DimArray.put
,DimArrayOnDisk.read
,DimArray.take_axis
Examples
>>> from dimarray import DimArray >>> v = DimArray([[1,2,3],[4,5,6]], axes=[["a","b"], [10.,20.,30.]], dims=['d0','d1'], dtype=float) >>> v dimarray: 6 non-null elements (0 null) 0 / d0 (2): 'a' to 'b' 1 / d1 (3): 10.0 to 30.0 array([[1., 2., 3.], [4., 5., 6.]])
Indexing via axis values (default)
>>> a = v[:,10] # python slicing method >>> a dimarray: 2 non-null elements (0 null) 0 / d0 (2): 'a' to 'b' array([1., 4.]) >>> b = v.take(10, axis=1) # take, by axis position >>> c = v.take(10, axis='d1') # take, by axis name >>> d = v.take({'d1':10}) # take, by dict {axis name : axis values} >>> (a==b).all() and (a==c).all() and (a==d).all() True
Indexing via integer index (indexing=”position” or ix property)
>>> np.all(v.ix[:,0] == v[:,10]) True >>> np.all(v.take(0, axis="d1", indexing="position") == v.take(10, axis="d1")) True
Multi-dimensional indexing
>>> v["a", 10] # also work with string axis 1.0 >>> v.take(('a',10)) # multi-dimensional, tuple 1.0 >>> v.take({'d0':'a', 'd1':10}) # dict-like arguments 1.0
Take a list of indices
>>> a = v[:,[10,20]] # also work with a list of index >>> a dimarray: 4 non-null elements (0 null) 0 / d0 (2): 'a' to 'b' 1 / d1 (2): 10.0 to 20.0 array([[1., 2.], [4., 5.]]) >>> b = v.take([10,20], axis='d1') >>> np.all(a == b) True
Take a slice:
>>> c = v[:,10:20] # axis values: slice includes last element >>> c dimarray: 4 non-null elements (0 null) 0 / d0 (2): 'a' to 'b' 1 / d1 (2): 10.0 to 20.0 array([[1., 2.], [4., 5.]]) >>> d = v.take(slice(10,20), axis='d1') # `take` accepts `slice` objects >>> np.all(c == d) True >>> v.ix[:,0:1] # integer position: does *not* include last element dimarray: 2 non-null elements (0 null) 0 / d0 (2): 'a' to 'b' 1 / d1 (1): 10.0 to 10.0 array([[1.], [4.]])
Keep dimensions
>>> a = v[["a"]] >>> b = v.take("a",keepdims=True) >>> np.all(a == b) True
tolerance parameter to achieve “nearest neighbour” search
>>> v.take(12, axis="d1", tol=5) dimarray: 2 non-null elements (0 null) 0 / d0 (2): 'a' to 'b' array([1., 4.])
# Matlab like multi-indexing
>>> v = DimArray(np.arange(2*3*4).reshape(2,3,4)) >>> v[[0,1],:,[0,0,0]].shape (2, 3, 3) >>> v[[0,1],:,[0,0]].shape # here broadcast = False (2, 3, 2) >>> v.take(([0,1],slice(None),[0,0]), broadcast=True).shape # that is traditional numpy, with broadcasting on same shape (2, 3) >>> v.values[[0,1],:,[0,0]].shape # a proof of it (2, 3)
>>> a = DimArray(np.arange(2*3).reshape(2,3))
>>> a[a > 3] # FULL ARRAY: return a numpy array in n-d case (at least for now) dimarray: 2 non-null elements (0 null) 0 / x0,x1 (2): (1, 1) to (1, 2) array([4, 5])
>>> a[a.x0 > 0] # SINGLE AXIS: only first axis dimarray: 3 non-null elements (0 null) 0 / x0 (1): 1 to 1 1 / x1 (3): 0 to 2 array([[3, 4, 5]])
>>> a[:, a.x1 > 0] # only second axis dimarray: 4 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (2): 1 to 2 array([[1, 2], [4, 5]])
>>> a[a.x0 > 0, a.x1 > 0] dimarray: 2 non-null elements (0 null) 0 / x0 (1): 1 to 1 1 / x1 (2): 1 to 2 array([[4, 5]])
Sample points along a path, a la numpy, with broadcast=True
>>> a.take(([0,0,1],[1,2,2]), broadcast=True) dimarray: 3 non-null elements (0 null) 0 / x0,x1 (3): (0, 1) to (1, 2) array([1, 2, 5])
Ellipsis (only one supported)
>>> a = DimArray(np.arange(2*3*4*5).reshape(2,3,4,5)) >>> a[0,...,0].shape (3, 4) >>> a[...,0,0].shape (2, 3)
-
DimArray.
put
()¶ Modify values of a DimArray
Parameters: - indices : int or list or slice (single-dimensional indices)
or a tuple of those (multi-dimensional) or dict of { axis name : axis values }
- axis : None or int or str, optional
if specified and indices is a slice, scalar or an array, assumes indexing is along this axis.
- indexing : {‘label’, ‘position’}, optional
Indexing mode. - “label”: indexing on axis labels (default) - “position”: use numpy-like position index Default value can be changed in dimarray.rcParams[‘indexing.by’]
- tol : None or float or tuple or dict, optional
tolerance when looking for numerical values, e.g. to use nearest neighbor search, default None.
- broadcast : bool, optional
if True, use numpy-like fancy indexing and broadcast any indexing array to a common shape, useful for example to sample points along a path. Default to False.
Returns: - None (inplace=True) or DimArray instance or scalar (inplace=False)
See also
DimArray.take
,DimArrayOnDisk.write
Re-indexing¶
-
DimArray.
reindex_axis
(values, axis=0, fill_value=nan, raise_error=False, method=None)¶ reindex an array along an axis
Parameters: - values : array-like or Axis
new axis values
- axis : int or str, optional
axis number or name
- fill_value: bool, optional
Fill data to use for missing axis value, if raise_error is False.
- raise_error : bool, optional
if True, raise error when an axis value is not present otherwise just replace with fill_value. Defaulf is False
- method : {None, ‘left’, ‘right’}
method to fill the gaps (default None) If ‘left’ or ‘right’, just pass along to numpy.searchsorted.
Returns: - dimarray: DimArray instance
Examples
Basic reindexing: fill missing values with NaN
>>> import dimarray as da >>> a = da.DimArray([1,2,3],axes=[('x0', [1,2,3])]) >>> b = da.DimArray([3,4],axes=[('x0',[1,3])]) >>> b.reindex_axis([1,2,3]) dimarray: 2 non-null elements (1 null) 0 / x0 (3): 1 to 3 array([ 3., nan, 4.])
Or replace with anything else, like -9999
>>> b.reindex_axis([1,2,3], fill_value=-9999) dimarray: 3 non-null elements (0 null) 0 / x0 (3): 1 to 3 array([ 3, -9999, 4])
-
DimArray.
reindex_like
(other, **kwargs)¶ reindex_like : re-index like another dimarray / axes instance
Applies reindex_axis on each axis to match another DimArray
Parameters: - other : DimArray or Axes instance
- **kwargs :
Returns: - DimArray
Notes
only reindex axes which are present in other
Examples
>>> import dimarray as da >>> b = da.DimArray([3,4],('x0',[1,3])) >>> c = da.DimArray([[1,2,3], [1,2,3]],[('x1',["a","b"]),('x0',[1, 2, 3])]) >>> b.reindex_like(c) dimarray: 2 non-null elements (1 null) 0 / x0 (3): 1 to 3 array([ 3., nan, 4.])
-
DimArray.
sort_axis
(axis=0, key=None, kind='quicksort')¶ sort an axis
Parameters: - a : DimArray (this argument is pre-assigned when using as bound method)
- axis : int or str, optional
axis by position (int) or name (str) (default: 0)
- key : callable or dict-like, optional
function that is called on each axis label and whose return value is used for sorting instead of axis label. Any other object with __getitem__ attribute may also be used as key, such as a dictionary. If None (the default), axis label is used for sorting.
- kind : str, optional
sort algorigthm (see numpy.sort for more info)
Returns: - sorted : new DimArray with sorted axis
Examples
Basic
>>> from dimarray import DimArray >>> a = DimArray([10,20,30], labels=[2, 0, 1]) >>> a dimarray: 3 non-null elements (0 null) 0 / x0 (3): 2 to 1 array([10, 20, 30])
>>> a.sort_axis() dimarray: 3 non-null elements (0 null) 0 / x0 (3): 0 to 2 array([20, 30, 10])
>>> a.sort_axis(key=lambda x: -x) dimarray: 3 non-null elements (0 null) 0 / x0 (3): 2 to 0 array([10, 30, 20])
Multi-dimensional
>>> a = DimArray([[10,20,30],[40,50,60]], labels=[[0, 1], ['a','c','b']]) >>> a.sort_axis(axis=1) dimarray: 6 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (3): 'a' to 'c' array([[10, 30, 20], [40, 60, 50]])
Missing values¶
-
DimArray.
dropna
(axis=0, minvalid=None, na=nan)¶ drop nans along an axis
Parameters: - axis : axis position or name or list of names
- minvalid : int, optional
min number of valid point in each slice along axis values by default all the points
Returns: - DimArray
Examples
1-Dimension
>>> from dimarray import DimArray >>> a = DimArray([1.,2,3],('time',[1950, 1955, 1960])) >>> a.ix[1] = np.nan >>> a dimarray: 2 non-null elements (1 null) 0 / time (3): 1950 to 1960 array([ 1., nan, 3.]) >>> a.dropna() dimarray: 2 non-null elements (0 null) 0 / time (2): 1950 to 1960 array([1., 3.])
Multi-dimensional
>>> a = DimArray([[ np.nan, 2., 3.],[ np.nan, 5., np.nan]]) >>> a dimarray: 3 non-null elements (3 null) 0 / x0 (2): 0 to 1 1 / x1 (3): 0 to 2 array([[nan, 2., 3.], [nan, 5., nan]]) >>> a.dropna(axis=1) dimarray: 2 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (1): 1 to 1 array([[2.], [5.]]) >>> a.dropna(axis=1, minvalid=1) # minimum number of valid values, equivalent to `how="all"` in pandas dimarray: 3 non-null elements (1 null) 0 / x0 (2): 0 to 1 1 / x1 (2): 1 to 2 array([[ 2., 3.], [ 5., nan]])
-
DimArray.
fillna
(value, inplace=False, na=nan)¶ Fill NaN with a replacement value
Examples
>>> from dimarray import DimArray >>> a = DimArray([1,2,np.nan]) >>> a.fillna(-99) dimarray: 3 non-null elements (0 null) 0 / x0 (3): 0 to 2 array([ 1., 2., -99.])
-
DimArray.
setna
(value, na=nan, inplace=False)¶ set a value as missing
Parameters: - value : the values to set to na
- na : the replacement value (default np.nan)
Examples
>>> from dimarray import DimArray >>> a = DimArray([1,2,-99]) >>> a.setna(-99) dimarray: 2 non-null elements (1 null) 0 / x0 (3): 0 to 2 array([ 1., 2., nan]) >>> a.setna([-99, 2]) # sequence dimarray: 1 non-null elements (2 null) 0 / x0 (3): 0 to 2 array([ 1., nan, nan]) >>> a.setna(a > 1) # boolean dimarray: 2 non-null elements (1 null) 0 / x0 (3): 0 to 2 array([ 1., nan, -99.]) >>> a = DimArray([[1,2,-99]]) # multi-dim >>> a.setna([-99, a>1]) # boolean dimarray: 1 non-null elements (2 null) 0 / x0 (1): 0 to 0 1 / x1 (3): 0 to 2 array([[ 1., nan, nan]])
To / From other objects¶
-
classmethod
DimArray.
from_pandas
(data, dims=None)[source]¶ Initialize a DimArray from pandas
Parameters: - data : pandas object (Series, DataFrame, Panel, Panel4D)
- dims, optional : dimension (axis) names, otherwise look at ax.name for ax in data.axes
Returns: - a : DimArray instance
Examples
>>> import pandas as pd >>> s = pd.Series([3,5,6], index=['a','b','c']) >>> s.index.name = 'dim0' >>> DimArray.from_pandas(s) dimarray: 3 non-null elements (0 null) 0 / dim0 (3): 'a' to 'c' array([3, 5, 6])
Also work with Multi-Index
>>> panel = pd.Panel(np.arange(2*3*4).reshape(2,3,4)) >>> b = panel.to_frame() # pandas' method to convert Panel to DataFrame via MultiIndex >>> DimArray.from_pandas(b) # doctest: +SKIP dimarray: 24 non-null elements (0 null) 0 / major,minor (12): (0, 0) to (2, 3) 1 / x1 (2): 0 to 1 ...
I/O¶
-
DimArray.
write_nc
(f, name=None, mode='w', clobber=None, format=None, *args, **kwargs)[source]¶ Write to netCDF
Parameters: - f : file name
- name : variable name, optional
must be provided if no attribute “name” is defined
- mode, clobber, format : see netCDF4.Dataset
- **kwargs : passed to netCDF4.Dataset.createVAriable (compression)
See also
DatasetOnDisk
Plotting¶
-
DimArray.
plot
(*args, **kwargs)¶ Plot 1-D or 2-D data.
Wraps matplotlib’s plot()
Parameters: - *args, **kwargs : passed to matplotlib.pyplot.plot
- legend : True (default) or False
Display legend for 2-D data.
- ax : matplotlib.Axis, optional
Provide axis on which to show the plot.
Returns: - lines : list of matplotlib’s Lines2D instances
Examples
>>> from dimarray import DimArray >>> data = DimArray(np.random.rand(4,3), axes=[np.arange(4), ['a','b','c']], dims=['distance', 'label']) >>> data.axes[0].units = 'meters' >>> h = data.plot(linewidth=2) >>> h = data.T.plot(linestyle='-.') >>> h = data.plot(linestyle='-.', legend=False)
-
DimArray.
pcolor
(*args, **kwargs)¶ Plot a quadrilateral mesh.
Wraps matplotlib pcolormesh(). See pcolormesh documentation in matplotlib for accepted keyword arguments.
Examples
>>> from dimarray import DimArray >>> x = DimArray(np.zeros([100,40])) >>> x.pcolor() # doctest: +SKIP >>> x.T.pcolor() # to flip horizontal/vertical axes # doctest: +SKIP
-
DimArray.
contourf
(*args, **kwargs)¶ Plot filled 2-D contours.
Wraps matplotlib contourf(). See contourf documentation in matplotlib for accepted keyword arguments.
Examples
>>> from dimarray import DimArray >>> x = DimArray(np.zeros([100,40])) >>> x[:50,:20] = 1. >>> x.contourf() # doctest: +SKIP >>> x.T.contourf() # to flip horizontal/vertical axes # doctest: +SKIP
-
DimArray.
contour
(*args, **kwargs)¶ Plot 2-D contours.
Wraps matplotlib contour(). See contour documentation in matplotlib for accepted keyword arguments.
Examples
>>> from dimarray import DimArray >>> x = DimArray(np.zeros([100,40])) >>> x[:50,:20] = 1. >>> x.contour() # doctest: +SKIP >>> x.T.contour() # to flip horizontal/vertical axes # doctest: +SKIP
Dataset API¶
Under construction…
Axis and Axes API¶
functions reference API¶
dimarray functions are listed below by topic, along with examples. DimArray Methods are provided in a separate page DimArray API.
Join¶
-
dimarray.
stack
(arrays, axis=None, keys=None, align=False, **kwargs)[source]¶ stack arrays along a new dimension (raise error if already existing)
Parameters: - arrays : sequence or dict of arrays
- axis : str, optional
new dimension along which to stack the array
- keys : array-like, optional
stack axis values, useful if array is a sequence, or a non-ordered dictionary
- align : bool, optional
if True, align axes prior to stacking (Default to False)
- **kwargs : optional key-word arguments passed to align, if align is True
Returns: - DimArray : joint array
See also
concatenate
- join arrays along an existing dimension
swapaxes
- to modify the position of the newly inserted axis
Examples
>>> from dimarray import DimArray >>> a = DimArray([1,2,3]) >>> b = DimArray([11,22,33]) >>> stack([a, b], axis='stackdim', keys=['a','b']) dimarray: 6 non-null elements (0 null) 0 / stackdim (2): 'a' to 'b' 1 / x0 (3): 0 to 2 array([[ 1, 2, 3], [11, 22, 33]])
-
dimarray.
concatenate
(arrays, axis=0, _no_check=False, align=False, **kwargs)[source]¶ concatenate several DimArrays
Parameters: - arrays : list of DimArrays
arrays to concatenate
- axis : int or str
axis along which to concatenate (must exist)
- align : bool, optional
align secondary axes before joining on the primary axis axis. Default to False.
- **kwargs : optional key-word arguments passed to align, if align is True
Returns: - concatenated DimArray
See also
stack
- join arrays along a new dimension
align
- align arrays
Examples
1-D
>>> from dimarray import DimArray >>> a = DimArray([1,2,3], axes=[['a','b','c']]) >>> b = DimArray([4,5,6], axes=[['d','e','f']]) >>> concatenate((a, b)) dimarray: 6 non-null elements (0 null) 0 / x0 (6): 'a' to 'f' array([1, 2, 3, 4, 5, 6])
2-D
>>> a = DimArray([[1,2,3],[11,22,33]]) >>> b = DimArray([[4,5,6],[44,55,66]]) >>> concatenate((a, b), axis=0) dimarray: 12 non-null elements (0 null) 0 / x0 (4): 0 to 1 1 / x1 (3): 0 to 2 array([[ 1, 2, 3], [11, 22, 33], [ 4, 5, 6], [44, 55, 66]]) >>> concatenate((a, b), axis='x1') dimarray: 12 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (6): 0 to 2 array([[ 1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66]])
-
dimarray.
stack_ds
(datasets, axis, keys=None, align=False, **kwargs)[source]¶ stack dataset along a new dimension
Parameters: - datasets: sequence or dict of datasets
- axis: str, new dimension along which to stack the dataset
- keys, optional: stack axis values, useful if dataset is a sequence, or a non-ordered dictionary
- align, optional: if True, align axes (via reindexing) *prior* to stacking
- **kwargs : optional key-word arguments passed to align, if align is True
Returns: - stacked dataset
See also
concatenate_ds
,stack
,sort_axis
Examples
>>> a = DimArray([1,2,3], dims=('dima',)) >>> b = DimArray([11,22], dims=('dimb',)) >>> ds = Dataset(a=a,b=b) # dataset of 2 variables from an experiment >>> ds2 = Dataset(a=a*2,b=b*2) # dataset of 2 variables from a second experiment >>> stack_ds([ds, ds2], axis='stackdim', keys=['exp1','exp2']) Dataset of 2 variables 0 / stackdim (2): 'exp1' to 'exp2' 1 / dima (3): 0 to 2 2 / dimb (2): 0 to 1 a: ('stackdim', 'dima') b: ('stackdim', 'dimb')
-
dimarray.
concatenate_ds
(datasets, axis=0, align=False, **kwargs)[source]¶ concatenate two datasets along an existing dimension
Parameters: - datasets: sequence of datasets
- axis: axis along which to concatenate
- align, optional: if True, align secondary axes (via reindexing) prior to concatenating
- **kwargs : optional key-word arguments passed to align, if align is True
Returns: - joint Dataset along axis
- NOTE: will raise an error if variables are there which do not contain the required dimension
See also
stack_ds
,concatenate
,sort_axis
Examples
>>> a = da.zeros(axes=[list('abc')], dims=('x0',)) # 1-D DimArray >>> b = da.zeros(axes=[list('abc'), [1,2]], dims=('x0','x1')) # 2-D DimArray >>> ds = Dataset(a=a,b=b) # dataset of 2 variables from an experiment >>> a2 = da.ones(axes=[list('def')], dims=('x0',)) >>> b2 = da.ones(axes=[list('def'), [1,2]], dims=('x0','x1')) # 2-D DimArray >>> ds2 = Dataset(a=a2,b=b2) # dataset of 2 variables from a second experiment >>> concatenate_ds([ds, ds2]) Dataset of 2 variables 0 / x0 (6): 'a' to 'f' 1 / x1 (2): 1 to 2 a: ('x0',) b: ('x0', 'x1')
Align¶
-
dimarray.
align_axes
(*args, **kwargs)¶ Deprecated. Now renamed to align
-
dimarray.
align_dims
(*arrays)[source]¶ Align dimensions of a list of arrays so that they are ready for broadcast.
Method: inserting singleton axes at the right place and transpose where needed. Note : not part of public API, but used in other dimarray modules
Examples
>>> import dimarray as da >>> import numpy as np >>> x = da.DimArray(np.arange(2), dims=('x0',)) >>> y = da.DimArray(np.arange(3), dims=('x1',)) >>> align_dims(x, y) [dimarray: 2 non-null elements (0 null) 0 / x0 (2): 0 to 1 1 / x1 (1): None to None array([[0], [1]]), dimarray: 3 non-null elements (0 null) 0 / x0 (1): None to None 1 / x1 (3): 0 to 2 array([[0, 1, 2]])]
-
dimarray.
broadcast_arrays
(*arrays)[source]¶ Analogous to numpy.broadcast_arrays
but with looser requirements on input shape and returns copy instead of views
Parameters: - arrays : variable list of DimArrays
Returns: - list of DimArrays
Examples
Just as numpy’s broadcast_arrays
>>> import dimarray as da >>> x = da.DimArray([[1,2,3]]) >>> y = da.DimArray([[1],[2],[3]]) >>> da.broadcast_arrays(x, y) [dimarray: 9 non-null elements (0 null) 0 / x0 (3): 0 to 2 1 / x1 (3): 0 to 2 array([[1, 2, 3], [1, 2, 3], [1, 2, 3]]), dimarray: 9 non-null elements (0 null) 0 / x0 (3): 0 to 2 1 / x1 (3): 0 to 2 array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])]
Interpolate¶
-
dimarray.
interp2d
(dim_array, newaxes, dims=(-2, -1), **kwargs)[source]¶ Two-dimensional interpolation
Parameters: - dim_array : DimArray instance
- newaxes : sequence of two array-like, or dict.
axes on which to interpolate
- dims : sequence of two axis names or integer rank, optional
Indicate dimensions which match newaxes. By default (-2, -1) (last two dimensions).
- **kwargs : passed to scipy.interpolate.RegularGridInterpolator
method : ‘nearest’ or ‘linear’ (default) bounds_error : True by default fill_value : np.nan by default, but set to None to extrapolate outside bounds.
Returns: - dim_array_int : DimArray instance
interpolated array
Examples
>>> from dimarray import DimArray, interp2d >>> x = np.array([0, 1, 2]) >>> y = np.array([0, 10]) >>> a = DimArray([[0,0,1],[1,0.,0.]], [('y',y),('x',x)]) >>> a dimarray: 6 non-null elements (0 null) 0 / y (2): 0 to 10 1 / x (3): 0 to 2 array([[0., 0., 1.], [1., 0., 0.]]) >>> newx = [0.5, 1.5] >>> newy = np.linspace(0,10,5) >>> ai = interp2d(a, [newy, newx]) >>> ai dimarray: 10 non-null elements (0 null) 0 / y (5): 0.0 to 10.0 1 / x (2): 0.5 to 1.5 array([[0. , 0.5 ], [0.125, 0.375], [0.25 , 0.25 ], [0.375, 0.125], [0.5 , 0. ]])
Use dims keyword argument if new axes order does not match array dimensions >>> (ai == interp2d(a, [newx, newy], dims=(‘x’,’y’))).all() True
Out-of-bounds filled with NaN: >>> newx = [-1, 1] >>> newy = [-5, 0, 10] >>> interp2d(a, [newy, newx], bounds_error=False) dimarray: 2 non-null elements (4 null) 0 / y (3): -5 to 10 1 / x (2): -1 to 1 array([[nan, nan],
[nan, 0.], [nan, 0.]])Nearest neighbor interpolation and out-of-bounds extrapolation >>> interp2d(a, [newy, newx], method=’nearest’, bounds_error=False, fill_value=None) dimarray: 6 non-null elements (0 null) 0 / y (3): -5 to 10 1 / x (2): -1 to 1 array([[0., 0.],
[0., 0.], [1., 0.]])
Stats¶
-
dimarray.
percentile
(a, pct, axis=0, newaxis=None, out=None, overwrite_input=False)[source]¶ calculate percentile along an axis
Parameters: - pct: float, percentile or sequence of percentiles (0< <100)
- axis, optional, default 0: axis along which to compute percentiles
- newaxis, optional: name of the new percentile axis, if more than one pct.
By default, append “_percentile” to the axis name on which the transformation is applied.
- out, overwrite_input: passed to numpy’s percentile method (see documentation)
Returns: - pctiles: DimArray or scalar whose required axis has been reduced or replaced by percentiles
Examples
>>> from dimarray import DimArray >>> np.random.seed(0) # for reproductibility of results >>> a = DimArray(np.random.randn(1000), dims=['sample']) >>> percentile(a, 50) -0.058028034799627745
>>> percentile(a, [50, 95]) dimarray: 2 non-null elements (0 null) 0 / sample_percentile (2): 50 to 95 array([-0.05802803, 1.66012041])
Read netCDF data¶
-
dimarray.
read_nc
(f, names=None, *args, **kwargs)[source]¶ Wrapper around DatasetOnDisk.read
Read one or several variables from one or several netCDF file
Parameters: - f : str or netCDF handle
netCDF file to read from or regular expression
- names : None or list or str, optional
variable name(s) to read default is None
- indices : int or list or slice (single-dimensional indices)
or a tuple of those (multi-dimensional) or dict of { axis name : axis indices }
Indices refer to Dataset axes. Any item that does not possess one of the dimensions will not be indexed along that dimension. For example, scalar items will be left unchanged whatever indices are provided.
- indexing : {‘label’, ‘position’}, optional
Indexing mode. - “label”: indexing on axis labels (default) - “position”: use numpy-like position index Default value can be changed in dimarray.rcParams[‘indexing.by’]
- tol : float, optional
tolerance when looking for numerical values, e.g. to use nearest neighbor search, default None.
- keepdims : bool, optional
keep singleton dimensions (default False)
- axis : str, optional
When reading multiple files, axis along which to join the dimarrays or datasets. It the axis already exist, the resulting arrays will be concatenated, otherwise they will be stacked along a new array (in the sense of the numpy functions concatenate and stack)
- keys : sequence, optional
When reading multiple files, keys for the join axis. If the axis already exists in the dataset, the concatenated dataset/dimarray will be re-indexed along the provided key, otherwise the keys will be used to create a new axis for stacking. In the latter case, keys’ length needs to exactly match the number of input files, and if not provided, file names will be taken instead. Note you may manually rename the axes later, or use the set_axis method.
- align : bool, optional
When reading multiple files, passed to stack (new axis) or concatenate (existing axis) to reindex all arrays onto common axes. (in concatenate mode, the concatenation axis is not re-indexed of course, only the secondary axes) Default to False.
- **kwargs : optional key-word arguments passed to align, if align is True
When reading multiple files, passed to stack (new axis) or This includes: sort (False by default) and join (‘outer’ by default)
Returns: - obj : DimArray or Dataset
depending on whether a (single) variable name is passed as argument (names) or not
See also
DatasetOnDisk.read
,stack
,concatenate
,stack_ds
,concatenate_ds
,align
,DimArray.write_nc
,Dataset.write_nc
Examples
>>> import os >>> from dimarray import read_nc, get_datadir
Single netCDF file
>>> ncfile = os.path.join(get_datadir(), 'cmip5.CSIRO-Mk3-6-0.nc')
>>> data = read_nc(ncfile) # load full file >>> data Dataset of 2 variables 0 / time (451): 1850 to 2300 1 / scenario (5): u'historical' to u'rcp85' tsl: (u'time', u'scenario') temp: (u'time', u'scenario') >>> data = read_nc(ncfile,'temp') # only one variable >>> data = read_nc(ncfile,'temp', indices={"time":slice(2000,2100), "scenario":"rcp45"}) # load only a chunck of the data >>> data = read_nc(ncfile,'temp', indices={"time":1950.3}, tol=0.5) # approximate matching, adjust tolerance >>> data = read_nc(ncfile,'temp', indices={"time":-1}, indexing='position') # integer position indexing
Multiple files Read variable ‘temp’ across multiple files (representing various climate models) In this case the variable is a time series, whose length may vary across experiments (thus align=True is passed to reindex axes before stacking)
>>> direc = get_datadir() >>> temp = da.read_nc(direc+'/cmip5.*.nc', 'temp', align=True, axis='model')
A new ‘model’ axis is created labeled with file names. It is then possible to rename it more appropriately, e.g. keeping only the part directly relevant to identify the experiment:
>>> getmodel = lambda x: os.path.basename(x).split('.')[1] # extract model name from path >>> temp.set_axis(getmodel, axis='model') # would return a copy if inplace is not specified >>> temp dimarray: 9114 non-null elements (6671 null) 0 / model (7): 'CSIRO-Mk3-6-0' to 'MPI-ESM-MR' 1 / time (451): 1850 to 2300 2 / scenario (5): u'historical' to u'rcp85' array(...)
This works on datasets as well:
>>> ds = da.read_nc(direc+'/cmip5.*.nc', align=True, axis='model') >>> ds.set_axis(getmodel, axis='model') >>> ds Dataset of 2 variables 0 / model (7): 'CSIRO-Mk3-6-0' to 'MPI-ESM-MR' 1 / time (451): 1850 to 2300 2 / scenario (5): u'historical' to u'rcp85' tsl: ('model', u'time', u'scenario') temp: ('model', u'time', u'scenario')
-
dimarray.
summary_nc
(fname, name=None, metadata=False)[source]¶ Print summary information about the content of a netCDF file Deprecated, see dimarray.open_nc
Table numpy vs dimarray¶
Table of correspondence between numpy ndarray and dimarray’s functions and methods
array creation | ||
---|---|---|
numpy | dimarray | comments |
array | DimArray | In dimarray need to provide axes information in addition to values. |
DimArray.from_kw | Same as DimArray() but provide axes as key-words. | |
array | same as DimArray() | |
array_kw | same as DimArray.from_kw() | |
zeros | zeros | These functions are similar to numpy except that they require axes parameter, or a shape= parameter for automatic labelling. |
ones | ones | |
empty | empty | |
zeros_like | zeros_like | |
ones_like | ones_like | |
empty_like | empty_like |
Note
The array and array_kw forms (to be used as da.array()) are attempts to make the array definition less verbose. They are experimental and may change in the future.
reshaping | ||
---|---|---|
numpy | dimarray | comments |
a.T | a.T | Transpose a 2-dimensional array |
a.transpose() | a.transpose() | Transpose or permute array. In dimarray also accept axis names |
a.swapaxes() | a.swapaxes() | Swap two axes. In dimarray also accept axis names. e.g. a.swapaxes(‘time’, 0) to bring the ‘time’ dimension as first axis to ease indexing. |
a.reshape() | a.reshape() | Change array shape without changing the size. There are a few differences in dimarray compared to numpy: - a dimension cannot be broken down (e.g. 4 => 2x2) - the full shape of the array is given via axis names e.g. a.reshape(‘time,percentile’,’scenario’) will flatten (group) the dimensions time and percentile to end up with a 2-D array, and transpose the array as necessary to get to the desired shape. If only transposing (permutation) is needed, the use of transpose is preferred for clarity. |
a.group() | Flatten two axes into one: it is for reshape what swapaxes is to transpose. | |
a.ungroup() | Inflate two or more “grouped” axes (undo a.group()). | |
a.flatten() | a.flatten() | Flatten array. In dimarray the axes are transformed into tuples (GroupedAxis). |
a[np.newaxis] | a.newaxis() | In numpy, add a singleton dimension, useful for broadcasting in an operation. In dimarray, broadcasting is based on dimension names and therefore streamlined without the need to profide this extra-information, make this option less relevant in the public API. In dimarray this is a method since it requires the name of the new axis, and by extension, if the new axis’ values are also provided it can also combines functionality of repeat. |
a.squeeze() | a.squeeze() | idem, but also accept axis names (opposite of newaxis) |
a.repeat() | a.repeat() | In dimarray it’s mostly an internal method that only works on singleton dimensions. This is of no much practical use. Use newaxis instead. |
broadcast() | a.broadcast() | Dimarray’s method similar to numpy’s function. Add or remove singleton axes to make it match another array’s dimensions, but without repeating (so that the shapes do not necessarily match, but it is ready for binary operations in a numpy sense) In dimarray, the broadcast method can also transpose axes to match dimension ordering. |
broadcast_arrays() | broadcast_arrays() | Functions. Like the above, but also repeat the arrays if necessary to match the shape. |
Note
The names group and ungroup may be confusing and could change in the future (e.g. to flatten and inflate, or unflatten)
The methods below are mostly similar across the packages, but dimarray also accepts axis name instead of axis rank as axis=. An optional skipna= parameter can be provided to ignore nans (default to False). Note also that in many cases, when a tuple of axis names is provided the array is first partially flattened (grouped axis) before the dimension is reduced
reduce, accumulate (along-axis transformation) | ||
---|---|---|
numpy | dimarray | comments |
a.max() | a.max() | |
a.min() | a.min() | |
a.ptp() | a.ptp() | |
a.median() | a.median() | |
a.all() | a.all() | |
a.any() | a.any() | |
a.prod() | a.prod() | |
a.sum() | a.sum() | |
a.mean() | a.mean() | |
a.std() | a.std() | |
a.var() | a.var() | |
a.argmax() | a.argmax() | in dimarray, returns axis value of max instead of integer position on the axis |
a.argmin() | a.argmin() | idem |
a.cumsum() | a.cumsum() | |
a.cumprod() | a.cumprod() | |
diff(a,…) | a.diff() | as method, and with scheme= parameter (“forward”, “centered”, “backward”) |
Maintainance of the documentation¶
The Documentation is generated with Sphinx from ReStructuredTxt files (‘.rst’).
Many sections however also exist and are maintained as notebooks, using basic formatting.
The conversion from notebook to rst is done via a script (take a look
)
and has been included as a Makefile command (make rst).
The workflow is as follow:
1. cd docs
2. ... # edit notebooks in notebooks/
3. ... # edit rst files
4. make rst # convert every notebook in docs/notebooks to rst in docs/_notebooks_rst
5. make html # this could also be combine above in make rst html
6. ... # check the result in docs/_build/html/index.html
7. ... # iterate until you are happy with the result
8. git add / rm / ci # commit the change
9. git push # push to github
Pushing to github will update the doc at readthedocs automatically.
Note
Step 4 will work only on unix system because bash is involved in one of the scripts (this could actually be written in python easily) There might also be other dependencies involved, maybe even the ipython version (did that with the latest 3.0.0).
Note
readthedocs will re-compile the rst files to html, so that steps 5-6 using your local sphinx installation are only for you to check the results before pushing.
Note
To compile locally with sphinx, you need to download sphinx of course, but also numpydoc (which parse numpy-like docstrings) e.g. “pip -r docs/readthedocs-pip-requirements.txt”