The easiest way to install HyperSpy is to use the
HyperSpy Bundle, which is available on Windows, MacOS
and Linux.
Alternatively, HyperSpy can be installed in an existing python distribution,
read the conda installation and
pip installation sections for instructions.
Note
To enable the context-menu (right-click) shortcut in a chosen folder, use
the start_jupyter_cm tool.
Conda is a package manager for Anaconda-like
distributions, such as the Miniforge
or the HyperSpy-bundle.
Since HyperSpy is packaged in the conda-forge channel,
it can easily be installed using conda.
To install HyperSpy run the following from the Anaconda Prompt on Windows or
from a Terminal on Linux and Mac.
$condainstallhyperspy-cconda-forge
This will also install the optional GUI packages hyperspy_gui_ipywidgets
and hyperspy_gui_traitsui. To install HyperSpy without the GUI packages, use:
$condainstallhyperspy-base-cconda-forge
Note
Depending on how Anaconda has been installed, it is possible that the
conda command is not available from the Terminal, read the
Anaconda User Guide for details.
Note
Using -cconda-forge is only necessary when the conda-forge channel
is not already added to the conda configuration, read the
conda-forge documentation
for more details.
Note
Depending on the packages installed in Anaconda, conda can be slow and
in this case mamba can be used as an alternative of conda since the
former is significantly faster. Read the
mamba documentation for instructions.
When installing packages, conda will verify that all requirements of all
packages installed in an environment are met. This can lead to situations where
a solution for dependencies resolution cannot be resolved or the solution may
include installing old or undesired versions of libraries. The requirements
depend on which libraries are already present in the environment as satisfying
their respective dependencies may be problematic. In such a situation, possible
solutions are:
use Miniconda instead of Anaconda, if you are installing a python
distribution from scratch: Miniconda only installs very few packages so satisfying
all dependencies is simple.
install HyperSpy in a new environment.
The following example illustrates how to create a new environment named hspy_environment,
activate it and install HyperSpy in the new environment.
A consequence of installing hyperspy in a new environment is that you need
to activate this environment using condaactivateenvironment_name where
environment_name is the name of the environment, however shortcuts can
be created using different approaches:
HyperSpy is listed in the Python Package Index. Therefore, it can be automatically downloaded
and installed pip. You may need to
install pip for the following commands to run.
To install all of HyperSpy’s functionalities, run:
$pipinstallhyperspy[all]
To install only the strictly required dependencies and limited functionalities,
use:
$pipinstallhyperspy
See the following list of selectors to select the installation of optional
dependencies required by specific functionalities:
ipython for integration with the ipython terminal and parallel processing using ipyparallel,
gui-traitsui to use the GUI elements based on traitsui,
speed install numba and numexpr to speed up some functionalities,
tests to install required libraries to run HyperSpy’s unit tests,
coverage to coverage statistics when running the tests,
doc to install required libraries to build HyperSpy’s documentation,
dev to install all the above,
all to install all the above except the development requirements
(tests, doc and dev).
For example:
$pipinstallhyperspy[learning,gui-jupyter]
Finally, be aware that HyperSpy depends on a number of libraries that usually
need to be compiled and therefore installing HyperSpy may require development
tools installed in the system. If the above does not work for you remember that
the easiest way to install HyperSpy is
using the HyperSpy bundle.
Due to the requirement of up to date versions for dependencies such as numpy,
scipy, etc., binary packages of HyperSpy are not provided for most linux
distributions and the installation via Anaconda/Miniconda
or Pip is recommended.
However, packages of the latest HyperSpy release and the related
GUI packages are maintained for the rolling release distributions
Arch-Linux (in the Arch User Repository) (AUR) and
openSUSE (Community Package)
as python-hyperspy and python-hyperspy-gui-traitsui,
python-hyperspy-gui-ipywidgets for the GUIs packages.
A more up-to-date package that contains all updates to be included
in the next minor version release (likely including new features compared to
the stable release) is also available in the AUR as python-hyperspy-git.
To get the development version from our git repository you need to install git. Then just do:
$gitclonehttps://github.com/hyperspy/hyperspy.git
Warning
When running hyperspy from a development version, it can happen that the
dependency requirement changes in which you will need to keep this
this requirement up to date (check dependency requirement in setup.py)
or run again the installation in development mode using pip as explained
below.
Installation in a Anaconda/Miniconda distribution#
Optionally, create an environment to separate your hyperspy installation from
other anaconda environments (read more about environments here):
$condacreate-nhspy_devpython# create an empty environment with latest python
$condaactivatehspy_dev# activate environment
Install the runtime and development dependencies requirements using conda:
The package hyperspy-dev will install the development dependencies required
for testing and building the documentation.
From the root folder of your hyperspy repository (folder containing the
setup.py file) run pip in development mode:
$pipinstall-e.--no-deps# install the currently checked-out branch of hyperspy
Installation in other (non-system) Python distribution#
From the root folder of your hyperspy repository (folder containing the
setup.py file) run pip in development mode:
$pipinstall-e.[dev]
All required dependencies are automatically installed by pip. If you don’t want
to install all dependencies and only install some of the optional dependencies,
use the corresponding selector as explained in the Installation using pip section
If you used the bundle installation you should be able to use the context menus
to get started. Right-click on the folder containing the data you wish to
analyse and select “Jupyter notebook here” or “Jupyter qtconsole here”. We
recommend the former, since notebooks have many advantages over conventional
consoles, as will be illustrated in later sections. The examples in some later
sections assume Notebook operation. A new tab should appear in your default
browser listing the files in the selected folder. To start a python notebook
choose “Python 3” in the “New” drop-down menu at the top right of the page.
Another new tab will open which is your Notebook.
You can start IPython by opening a system terminal and executing ipython,
(optionally followed by the “frontend”: “qtconsole” for example). However, in
most cases, the most agreeable way to work with HyperSpy interactively
is using the Jupyter Notebook (previously known as
the IPython Notebook), which can be started as follows:
Typically you will need to set up IPython for interactive plotting with
matplotlib using
%matplotlib (which is known as a ‘Jupyter magic’)
before executing any plotting command. So, typically, after starting
IPython, you can import HyperSpy and set up interactive matplotlib plotting by
executing the following two lines in the IPython terminal (In these docs we
normally use the general Python prompt symbol >>> but you will probably
see In[1]: etc.):
>>> %matplotlibqt>>> importhyperspy.apiashs
Note that to execute lines of code in the notebook you must press
Shift+Return. (For details about notebooks and their functionality try
the help menu in the notebook). Next, import two useful modules: numpy and
matplotlib.pyplot, as follows:
The rest of the documentation will assume you have done this. It also assumes
that you have installed at least one of HyperSpy’s GUI packages:
jupyter widgets GUI
and the
traitsui GUI.
HyperSpy supports different GUIs and
matplotlib backends
which in specific cases can lead to warnings when importing HyperSpy. Most of the time
there is nothing to worry about — the warnings simply inform you of several choices you have.
There may be several causes for a warning, for example:
not all the GUIs packages are installed. If none is installed, we reccomend you to install
at least the hyperspy-gui-ipywidgets package is your are planning to perform interactive
data analysis in the Jupyter Notebook. Otherwise, you can simply disable the warning in
preferences as explained below.
the hyperspy-gui-traitsui package is installed and you are using an incompatible matplotlib
backend (e.g. notebook, nbagg or widget).
If you want to use the traitsui GUI, use the qt matplotlib backend instead.
Alternatively, if you prefer to use the notebook or widget matplotlib backend,
and if you don’t want to see the (harmless) warning, make sure that you have the
hyperspy-gui-ipywidgets installed and disable the traitsui
GUI in the preferences.
Changed in version v1.3: HyperSpy works with all matplotlib backends, including the notebook
(also called nbAgg) backend that enables interactive plotting embedded
in the jupyter notebook.
Note
When running in a headless system it is necessary to set the matplotlib
backend appropiately to avoid a cannot connect to X server error, for
example as follows:
When using IPython, the documentation (docstring in Python jargon) can be
accessed by adding a question mark to the name of a function. e.g.:
In[1]:importhyperspy.apiashs
This syntax is a shortcut to the standard way one of displaying the help
associated to a given functions (docstring in Python jargon) and it is one of
the many features of IPython, which is the
interactive python shell that HyperSpy uses under the hood.
Another useful IPython feature is the
autocompletion
of commands and filenames using the tab and arrow keys. It is highly recommended
to read the Ipython introduction for many more useful features that will
boost your efficiency when working with HyperSpy/Python interactively.
HyperSpy can operate on any numpy array by assigning it to a BaseSignal class.
This is useful e.g. for loading data stored in a format that is not yet
supported by HyperSpy—supposing that they can be read with another Python
library—or to explore numpy arrays generated by other Python
libraries. Simply select the most appropriate signal from the
signals module and create a new instance by passing a numpy array
to the constructor e.g.
In HyperSpy the data is interpreted as a signal array and, therefore, the data
axes are not equivalent. HyperSpy distinguishes between signal and
navigation axes and most functions operate on the signal axes and
iterate on the navigation axes. For example, an EELS spectrum image (i.e.
a 2D array of spectra) has three dimensions X, Y and energy-loss. In
HyperSpy, X and Y are the navigation dimensions and the energy-loss is the
signal dimension. To make this distinction more explicit the
representation of the object includes a separator | between the
navigation and signal dimensions e.g.
In HyperSpy a spectrum image has signal dimension 1 and navigation dimension 2
and is stored in the Signal1D subclass.
Note that HyperSpy rearranges the axes when compared to the array order. The
following few paragraphs explain how and why it does it.
Depending how the array is arranged, some axes are faster to iterate than
others. Consider an example of a book as the dataset in question. It is
trivially simple to look at letters in a line, and then lines down the page,
and finally pages in the whole book. However if your words are written
vertically, it can be inconvenient to read top-down (the lines are still
horizontal, it’s just the meaning that’s vertical!). It’s very time-consuming
if every letter is on a different page, and for every word you have to turn 5-6
pages. Exactly the same idea applies here - in order to iterate through the
data (most often for plotting, but applies for any other operation too), you
want to keep it ordered for “fast access”.
In Python (more explicitly numpy) the “fast axes order” is C order (also
called row-major order). This means that the last axis of a numpy array is
fastest to iterate over (i.e. the lines in the book). An alternative ordering
convention is F order (column-major), where it is the reverse - the first axis
of an array is the fastest to iterate over. In both cases, the further an axis
is from the fast axis the slower it is to iterate over it. In the book
analogy you could think, for example, think about reading the first lines of
all pages, then the second and so on.
When data is acquired sequentially it is usually stored in acquisition order.
When a dataset is loaded, HyperSpy generally stores it in memory in the same
order, which is good for the computer. However, HyperSpy will reorder and
classify the axes to make it easier for humans. Let’s imagine a single numpy
array that contains pictures of a scene acquired with different exposure times
on different days. In numpy the array dimensions are (D,E,Y,X). This
order makes it fast to iterate over the images in the order in which they were
acquired. From a human point of view, this dataset is just a collection of
images, so HyperSpy first classifies the image axes (X and Y) as
signal axes and the remaining axes the navigation axes. Then it reverses
the order of each sets of axes because many humans are used to get the X
axis first and, more generally the axes in acquisition order from left to
right. So, the same axes in HyperSpy are displayed like this: (E,D|X,Y).
Extending this to arbitrary dimensions, by default, we reverse the numpy axes,
chop it into two chunks (signal and navigation), and then swap those chunks, at
least when printing. As an example:
(a1,a2,a3,a4,a5,a6)# original (numpy)(a6,a5,a4,a3,a2,a1)# reverse(a6,a5)(a4,a3,a2,a1)# chop(a4,a3,a2,a1)(a6,a5)# swap (HyperSpy)
In the background, HyperSpy also takes care of storing the data in memory in
a “machine-friendly” way, so that iterating over the navigation axes is always
fast.
The data can be saved to several file formats. The format is specified by
the extension of the filename.
>>> # load the data>>> d=hs.load("example.tif")>>> # save the data as a tiff>>> d.save("example_processed.tif")>>> # save the data as a png>>> d.save("example_processed.png")>>> # save the data as an hspy file>>> d.save("example_processed.hspy")
Some file formats are much better at maintaining the information about
how you processed your data. The preferred formats are
hspy and zspy,
because they are open formats and keep most information possible.
There are optional flags that may be passed to the save function. See
Saving for more details.
When loading a file HyperSpy stores all metadata in the BaseSignal
original_metadata attribute. In addition,
some of those metadata and any new metadata generated by HyperSpy are stored in
metadata attribute.
Added in version 1.3: Possibility to enable/disable GUIs in the preferences.
It is also possible to set the preferences programmatically. For example,
to disable the traitsui GUI elements and save the changes to disk:
>>> hs.preferences.GUIs.enable_traitsui_gui=False>>> hs.preferences.save()>>> # if not saved, this setting will be used until the next jupyter kernel shutdown
Changed in version 1.3: The following items were removed from preferences:
General.default_export_format, General.lazy,
Model.default_fitter, Machine_learning.multiple_files,
Machine_learning.same_window, Plot.default_style_to_compare_spectra,
Plot.plot_on_load, Plot.pylab_inline, EELS.fine_structure_width,
EELS.fine_structure_active, EELS.fine_structure_smoothing,
EELS.synchronize_cl_with_ll, EELS.preedge_safe_window_width,
EELS.min_distance_between_edges_for_fine_structure.
HyperSpy writes messages to the Python logger. The
default log level is “WARNING”, meaning that only warnings and more severe
event messages will be displayed. The default can be set in the
preferences. Alternatively, it can be set
using set_log_level() e.g.:
>>> importhyperspy.apiashs>>> hs.set_log_level('INFO')>>> hs.load('my_file.dm3')INFO:hyperspy.io_plugins.digital_micrograph:DM version: 3INFO:hyperspy.io_plugins.digital_micrograph:size 4796607 BINFO:hyperspy.io_plugins.digital_micrograph:Is file Little endian? TrueINFO:hyperspy.io_plugins.digital_micrograph:Total tags in root group: 15<Signal2D, title: My file, dimensions: (|1024, 1024)
Changed in version 2.0: The IO plugins formerly developed within HyperSpy have been moved to
the separate package RosettaSciIO
in order to facilitate a wider use also by other packages. Plugins supporting
additional formats or corrections/enhancements to existing plugins should now
be contributed to the RosettaSciIO repository
and file format specific issues should be reported to the RosettaSciIO issue
tracker.
HyperSpy can read and write to multiple formats (see Supported formats).
To load data use the load() command. For example, to load the
image spam.jpg, you can type:
>>> s=hs.load("spam.jpg")
If loading was successful, the variable s contains a HyperSpy signal or any
type of signal defined in one of the HyperSpy extensions,
see Specifying signal type for more details.
Note
When the file contains several datasets, the load() function
will return a list of HyperSpy signals, instead of a single HyperSpy signal.
Each signal can then be accessed using list indexation.
The load function returns an object that contains data read from the file.
We assign this object to the variable s but you can choose any (valid)
variable name you like. for the filename, don't forget to include the
quotation marks and the file extension.
If no argument is passed to the load function, a window will be raised that
allows to select a single file through your OS file manager, e.g.:
>>> # This raises the load user interface>>> s=hs.load()
It is also possible to load multiple files at once or even stack multiple
files. For more details read Loading multiple files.
HyperSpy will attempt to infer the appropriate file reader to use based on
the file extension (for example. .hspy, .emd and so on). You can
override this using the reader keyword:
# Load a .hspy file with an unknown extension>>>s=hs.load("filename.some_extension",reader="hspy")# doctest: +SKIP
HyperSpy will attempt to infer the most suitable signal type for the data being
loaded. Domain specific signal types are provided by extension libraries. To list the signal types
available on your local installation use:
>>> hs.print_known_signal_types()
When loading data, the signal type can be specified by providing the signal_type
keyword, which has to correspond to one of the available subclasses of signal:
>>> s=hs.load("filename",signal_type="EELS")
If the loaded file contains several datasets, the load()
function will return a list of the corresponding signals:
Most scientific file formats store some extra information about the data and the
conditions under which it was acquired (metadata). HyperSpy reads most of them and
stores them in the original_metadata attribute.
Also, depending on the file format, a part of this information will be mapped by
HyperSpy to the metadata attribute, where it can
for example be used by routines operating on the signal. See the metadata structure for details.
Note
Extensive metadata can slow down loading and processing, and
loading the original_metadata can be disabled
using the load_original_metadata argument of the load()
function. If this argument is set to False, the
metadata will still be populated.
To print the content of the attributes simply use:
Almost all file readers support lazy loading, which means accessing the data
without loading it to memory (see Supported formats for a
list). This feature can be useful when analysing large files. To use this feature,
set lazy to True e.g.:
The units of the navigation and signal axes can be converted automatically
during loading using the convert_units parameter. If True, the
convert_to_units method of the axes_manager will be used for the conversion
and if set to False, the units will not be converted (default).
Rather than loading files individually, several files can be loaded with a
single command. This can be done by passing a list of filenames to the load
functions, e.g.:
Alternatively, regular expression type character classes can be used such as
[a-z] for lowercase letters or [0-9] for one digit integers:
>>> s=hs.load('file[0-9].hspy')
Note
Wildcards are implemented using glob.glob(), which treats *, [
and ] as special characters for pattern matching. If your filename or
path contains square brackets, you may want to set
escape_square_brackets=True:
>>> # Say there are two files like this:>>> # /home/data/afile[1x1].hspy>>> # /home/data/afile[1x2].hspy>>> s=hs.load("/home/data/afile[*].hspy",escape_square_brackets=True)
>>> importhyperspy.apiashs>>> frompathlibimportPath>>> # Use pathlib.Path>>> p=Path("/path/to/a/file.hspy")>>> s=hs.load(p)>>> # Use pathlib.Path.glob>>> p=Path("/path/to/some/files/").glob("*.hspy")>>> s=hs.load(p)
By default HyperSpy will return a list of all the files loaded. Alternatively,
by setting stack=True, HyperSpy can be instructed to stack the data - given
that the files contain data with exactly the same
dimensions. If this is not the case, an error is raised. If each file contains
multiple (N) signals, N stacks will be created. Here, the number of signals
per file must also match, or an error will be raised.
To save data to a file use the save() method. The
first argument is the filename and the format is defined by the filename
extension. If the filename does not contain the extension, the default format
(HSpy-HDF5) is used. For example, if the s variable
contains the BaseSignal that you want to write to a file,
the following will write the data to a file called spectrum.hspy in the
default HSpy-HDF5 format:
>>> s.save('spectrum')
If you want to save to the ripple format instead, write:
>>> s.save('spectrum.rpl')
Some formats take extra arguments. See the corresponding pages at
Supported formats for more information.
This subsection can be a bit confusing for beginners.
Do not worry if you do not understand it all.
HyperSpy stores the data in the BaseSignal class, that is
the object that you get when e.g. you load a single file using
load(). Most of the data analysis functions are also contained in
this class or its specialized subclasses. The BaseSignal
class contains general functionality that is available to all the subclasses.
The subclasses provide functionality that is normally specific to a particular
type of data, e.g. the Signal1D class provides
common functionality to deal with one-dimensional (e.g. spectral) data and
exspy.signals.EELSSpectrum (which is a subclass of
Signal1D) adds extra functionality to the
Signal1D class for electron energy-loss
spectroscopy data analysis.
A signal store other objects in what are called attributes. For
examples, the data is stored in a numpy array in the
data attribute, the original parameters in the
original_metadata attribute, the mapped parameters
in the metadata attribute and the axes
information (including calibration) can be accessed (and modified) in the
AxesManager attribute.
>>> metadata_dict={'General':{'name':'A BaseSignal'}}>>> metadata_dict['General']['title']='A BaseSignal title'>>> s=hs.signals.BaseSignal(np.arange(10),metadata=metadata_dict)>>> s.metadata├── General│ ├── name = A BaseSignal│ └── title = A BaseSignal title└── Signal └── signal_type =
Instead of using a list of axes dictionaries[dict0,dict1] during signal
initialization, you can also pass a list of axes objects: [axis0,axis1].
HyperSpy can deal with data of arbitrary dimensions. Each dimension is
internally classified as either “navigation” or “signal” and the way this
classification is done determines the behaviour of the signal.
The concept is probably best understood with an example: let’s imagine a three
dimensional dataset e.g. a numpy array with dimensions (10, 20, 30). This
dataset could be an spectrum image acquired by scanning over a sample in two
dimensions. As in this case the signal is one-dimensional we use a
Signal1D subclass for this data e.g.:
In HyperSpy’s terminology, the signal dimension of this dataset is 30 and
the navigation dimensions (20, 10). Notice the separator | between the
navigation and signal dimensions.
However, the same dataset could also be interpreted as an image
stack instead. Actually it could has been acquired by capturing two
dimensional images at different wavelengths. Then it would be natural to
identify the two spatial dimensions as the signal dimensions and the wavelength
dimension as the navigation dimension. To view the data in this way we could
have used a Signal2D instead e.g.:
Indeed, for data analysis purposes,
one may like to operate with an image stack as if it was a set of spectra or
viceversa. One can easily switch between these two alternative ways of
classifying the dimensions of a three-dimensional dataset by
transforming between BaseSignal subclasses.
The same dataset could be seen as a three-dimensional signal:
Notice that with use BaseSignal because there is
no specialised subclass for three-dimensional data. Also note that by default
BaseSignal interprets all dimensions as signal dimensions.
We could also configure it to operate on the dataset as a three-dimensional
array of scalars by changing the default view of
BaseSignal by taking the transpose of it:
Although each dimension can be arbitrarily classified as “navigation
dimension” or “signal dimension”, for most common tasks there is no need to
modify HyperSpy’s default choice.
The signals module, which contains all available signal subclasses,
is imported in the user namespace when loading HyperSpy. In the following
example we create a Signal2D instance from a 2D numpy array:
The table below summarises all the
BaseSignal subclasses currently distributed
with HyperSpy. From HyperSpy 2.0, all domain specific signal
subclasses, characterized by the signal_type metadata attribute, are
provided by dedicated extension packages.
The generic subclasses provided by HyperSpy are characterized by the the data
dtype and the signal dimension. In particular, there are specialised signal
subclasses to handle complex data. See the table and diagram below. Where
appropriate, functionalities are restricted to certain
BaseSignal subclasses.
Diagram showing the inheritance structure of the different subclasses. The
upper part contains the generic classes shipped with HyperSpy. The lower
part contains examples of domain specific subclasses provided by some of the
HyperSpy extensions.#
Changed in version 1.0: The subclasses Simulation, SpectrumSimulation and ImageSimulation
were removed.
Added in version 1.5: External packages can register extra BaseSignal
subclasses.
Changed in version 2.0: The subclasses EELS, EDS_SEM, EDS_TEM and
DielectricFunction have been moved to the extension package
EleXSpy and the subclass hologram has been
moved to the extension package HoloSpy.
Domain specific functionalities for specific types of data are provided through
a number of dedicated python packages that qualify as HyperSpy extensions. These
packages provide subclasses of the generic signal classes listed above, depending
on the dimensionality and type of the data. Some examples are included in the
diagram above.
If an extension package is installed on your system, the provided signal
subclasses are registered with HyperSpy and these classes are directly
available when loading the hyperspy.api into the namespace. A list of packages
that extend HyperSpy
is curated in a dedicated repository.
The metadata attribute signal_type describes the nature of the signal. It can
be any string, normally the acronym associated with a particular signal. To print
all BaseSignal subclasses available in your system call
the function print_known_signal_types() as in the following
example:
When loading data, the signal_type will be
set automatically by the file reader, as defined in rosettasciio. If the
extension providing the corresponding signal subclass is installed,
load() will return the subclass from the hyperspy extension,
otherwise a warning will be raised to explain that
no registered signal class can be assigned to the given signal_type.
Since the load() can return domain specific signal objects (e.g.
EDSSEMSpectrum from EleXSpy) provided by extensions, the corresponding
functionalities (so-called method of object in object-oriented programming,
e.g. EDSSEMSpectrum.get_lines_intensity()) implemented in signal classes of
the extension can be accessed directly. To use additional functionalities
implemented in extensions, but not as method of the signal class, the extensions
need to be imported explicitly (e.g. importelexspy). Check the user guides
of the respective HyperSpy extensions for details on the
provided methods and functions.
A ragged array (also called jagged array) is an array created with
sequences-of-sequences, where the nested sequences don’t have the same length.
For example, a numpy ragged array can be created as follow:
Numpy ragged array must have python object type to allow the variable length of
the nested sequences - here [1,2,3] and [1]. As explained in
NEP-34,
dtype=object needs to be specified when creating the array to avoid ambiguity
about the shape of the array.
HyperSpy supports the use of ragged array with the following conditions:
The signal must be explicitly defined as being ragged, either when creating
the signal or by changing the ragged attribute of the signal
The signal dimension is the variable length dimension of the array
Unlike in the previous example, here the array is not ragged, because
the length of the nested sequences are equal (2) and numpy will create
an array of shape (2, 2) instead of (2, ) as in the previous example of
ragged array
>>> arr.shape(2, 2)
In addition to the use of the keyword ragged when creating an hyperspy
signal, the ragged attribute can also
be set to specify whether the signal contains a ragged array or not.
In the following example, an hyperspy signal is created without specifying that
the array is ragged. In this case, the signal dimension is 2, which can be
misleading, because each item contains a list of numbers. To provide a unambiguous
representation of the fact that the signal contains a ragged array, the
ragged attribute can be set to True.
By doing so, the signal space will be described as “ragged” and the navigation shape
will become the same as the shape of the ragged array:
Signals that are a histogram of a probability density function (pdf) should
have the is_binned attribute of the signal axis set to True. The reason
is that some methods operate differently on signals that are binned. An
example of binned signals are EDS spectra, where the multichannel analyzer
integrates the signal counts in every channel (=bin).
Note that for 2D signals each signal axis has an is_binned
attribute that can be set independently. For example, for the first signal
axis: signal.axes_manager.signal_axes[0].is_binned.
The default value of the is_binned attribute is shown in the
following table:
Binned default values for the different subclasses.#
For binned axes, the detector already provides the per-channel integration of
the signal. Therefore, in this case, integrate1D()
performs a simple summation along the given axis. In contrast, for unbinned
axes, integrate1D() calls the
integrate_simpson() method.
Indexing a BaseSignal provides a powerful, convenient and
Pythonic way to access and modify its data. In HyperSpy indexing is achieved
using isig and inav,
which allow the navigation and signal dimensions to be indexed independently.
The idea is essentially to specify a subset of the data based on its position
in the array and it is therefore essential to know the convention adopted for
specifying that position, which is described here.
Those new to Python may find indexing a somewhat esoteric concept but once
mastered it is one of the most powerful features of Python based code and
greatly simplifies many common tasks. HyperSpy’s Signal indexing is similar
to numpy array indexing and those new to Python are encouraged to read the
associated numpy documentation on the subject.
Key features of indexing in HyperSpy are as follows (note that some of these
features differ from numpy):
HyperSpy indexing does:
Allow independent indexing of signal and navigation dimensions
Support indexing with decimal numbers.
Support indexing with units.
Support indexing with relative coordinates i.e. ‘rel0.5’
Use the image order for indexing i.e. [x, y, z,…] (HyperSpy) vs
[…, z, y, x] (numpy)
HyperSpy indexing does not:
Support indexing using arrays.
Allow the addition of new axes using the newaxis object.
The examples below illustrate a range of common indexing tasks.
First consider indexing a single spectrum, which has only one signal dimension
(and no navigation dimensions) so we use isig:
Unlike numpy, HyperSpy supports indexing using decimal numbers or strings
(containing a decimal number and units), in which case
HyperSpy indexes using the axis scales instead of the indices. Additionally,
one can index using relative coordinates, for example 'rel0.5' to index the
middle of the axis.
Importantly the original BaseSignal and its “indexed self”
share their data and, therefore, modifying the value of the data in one
modifies the same value in the other. Note also that in the example below
s.data is used to access the data as a numpy array directly and this array is
then indexed using numpy indexing.
Below we briefly introduce some of the most commonly used tools (methods). For
more details about a particular method click on its name. For a detailed list
of all the methods available see the BaseSignal documentation.
The methods of this section are available to all the signals. In other chapters
methods that are only available in specialized subclasses are listed.
Note that by default all this methods perform the operation over all
navigation axes.
Example:
>>> s=hs.signals.BaseSignal(np.random.random((2,4,6)))>>> s.axes_manager[0].name='E'>>> s<BaseSignal, title: , dimensions: (|6, 4, 2)>>>> # by default perform operation over all navigation axes>>> s.sum()<BaseSignal, title: , dimensions: (|6, 4, 2)>>>> # can also pass axes individually>>> s.sum('E')<Signal2D, title: , dimensions: (|4, 2)>>>> # or a tuple of axes to operate on, with duplicates, by index or directly>>> ans=s.sum((-1,s.axes_manager[1],'E',0))>>> ans<BaseSignal, title: , dimensions: (|1)>>>> ans.axes_manager[0]<Scalar axis, size: 1>
The following methods operate only on one axis at a time:
Notice that the title is automatically updated. When the signal has no title
a new title is automatically generated:
>>> np.add(hs.signals.Signal1D([0,1]),hs.signals.Signal1D([0,1]))<Signal1D, title: add(Untitled Signal 1, Untitled Signal 2), dimensions: (|2)>
Functions (other than unfucs) that operate on numpy arrays can also operate
on BaseSignal instances, however they return a numpy
array instead of a BaseSignal instance e.g.:
>>> np.angle(s)array([0., 0.])
Note
For numerical differentiation and integration, use the proper
methods derivative() and
integrate1D(). In certain cases, particularly
when operating on a non-uniform axis, the approximations using the
diff() and sum()
methods will lead to erroneous results.
These operations are performed element-wise. When the dimensions of the signals
are not equal numpy broadcasting rules apply independently
for the navigation and signal axes.
Warning
Hyperspy does not check if the calibration of the signals matches.
In the following example s2 has only one navigation axis while s has two.
However, because the size of their first navigation axis is the same, their
dimensions are compatible and s2 is
broadcasted to match s’s dimensions.
In-place operators also support broadcasting, but only when broadcasting would
not change the left most signal dimensions:
>>> s+=s2>>> s<Signal2D, title: , dimensions: (2, 3|4, 5)>>>> s2+=sTraceback (most recent call last):
File "<ipython-input-64-fdb9d3a69771>", line 1, in <module>s2+=s
File "<string>", line 2, in __iadd__
File "/home/fjd29/Python/hyperspy/hyperspy/signal.py", line 2737, in _binary_operator_rulerself.data=getattr(sdata,op_name)(odata)ValueError: non-broadcastable output operand with shape (3,2,1,4) doesn\'t match the broadcast shape (3,2,5,4)
BaseSignal instances are iterables over the navigation axes. For example, the
following code creates a stack of 10 images and saves them in separate “png”
files by iterating over the signal instance:
>>> image_stack=hs.signals.Signal2D(np.random.randint(10,size=(2,5,64,64)))>>> forsingle_imageinimage_stack:... single_image.save("image %s.png"%str(image_stack.axes_manager.indices))The "image (0, 0).png" file was created.The "image (1, 0).png" file was created.The "image (2, 0).png" file was created.The "image (3, 0).png" file was created.The "image (4, 0).png" file was created.The "image (0, 1).png" file was created.The "image (1, 1).png" file was created.The "image (2, 1).png" file was created.The "image (3, 1).png" file was created.The "image (4, 1).png" file was created.
The data of the signal instance that is returned at each iteration is a view of
the original data, a property that we can use to perform operations on the
data. For example, the following code rotates the image at each coordinate by
a given angle and uses the stack() function in combination
with list comprehensions
to make a horizontal “collage” of the image stack:
>>> importscipy.ndimage>>> image_stack=hs.signals.Signal2D(np.array([scipy.datasets.ascent()]*5))>>> image_stack.axes_manager[1].name="x">>> image_stack.axes_manager[2].name="y">>> forimage,angleinzip(image_stack,(0,45,90,135,180)):... image.data[:]=scipy.ndimage.rotate(image.data,angle=angle,... reshape=False)>>> # clip data to integer range:>>> image_stack.data=np.clip(image_stack.data,0,255)>>> collage=hs.stack([imageforimageinimage_stack],axis=0)>>> collage.plot(scalebar=False)
Performing an operation on the data at each coordinate, as in the previous example,
using an external function can be more easily accomplished using the
map() method:
>>> importscipy.ndimage>>> image_stack=hs.signals.Signal2D(np.array([scipy.datasets.ascent()]*4))>>> image_stack.axes_manager[1].name="x">>> image_stack.axes_manager[2].name="y">>> image_stack.map(scipy.ndimage.rotate,angle=45,reshape=False)>>> # clip data to integer range>>> image_stack.data=np.clip(image_stack.data,0,255)>>> collage=hs.stack([imageforimageinimage_stack],axis=0)>>> collage.plot()
Rotation of images by the same amount using map().#
The map() method can also take variable
arguments as in the following example.
Rotation of images using map() with different
arguments for each image in the stack.#
Added in version 1.2.0: inplace keyword and non-preserved output shapes
If all function calls do not return identically-shaped results, only navigation
information is preserved, and the final result is an array where
each element corresponds to the result of the function (or arbitrary object
type). These are ragged arrays and has the dtype object.
As such, most HyperSpy functions cannot operate on such signals, and the
data should be accessed directly.
The inplace keyword (by default True) of the
map() method allows either overwriting the current
data (default, True) or storing it to a new signal (False).
Added in version 1.4: Iterating over signal using a parameter with no navigation dimension.
In this case, the parameter is cyclically iterated over the navigation
dimension of the input signal. In the example below, signal s is
multiplied by a cosine parameter d, which is repeated over the
navigation dimension of s.
Especially when working with very large datasets, it can be useful to
not do the computation immediately. For example if it would make you run
out of memory. In that case, the lazy_output parameter can be used.
s_out can then be saved to a hard drive, to avoid it being loaded into memory.
Alternatively, it can be computed and loaded into memory using s_out.compute()
>>> s_out.save("gaussian_filter_file.hspy")
Another advantage of using lazy_output=True is the ability to “chain” operations,
by running map() on the output from a previous
map() operation.
For example, first running a Gaussian filter, followed by peak finding. This can
improve the computation time, and reduce the memory need.
Cropping can be performed in a very compact and powerful way using
Indexing . In addition it can be performed using the following
method or GUIs if cropping signal1D or signal2D. There is also a general crop()
method that operates in place.
Added in version 1.3: rebin() generalized to remove the constrain
of the new_shape needing to be a divisor of data.shape.
The rebin() methods supports rebinning the data to
arbitrary new shapes as long as the number of dimensions stays the same.
However, internally, it uses two different algorithms to perform the task. Only
when the new shape dimensions are divisors of the old shape’s, the operation
supports lazy-evaluation and is usually faster.
Otherwise, the operation requires linear interpolation.
For example, the following two equivalent rebinning operations can be performed
lazily:
>>> s=hs.data.two_gaussians().as_lazy()>>> print(s)<LazySignal1D, title: Two Gaussians, dimensions: (32, 32|1024)>>>> print(s.rebin(scale=[1,1,2]))<LazySignal1D, title: Two Gaussians, dimensions: (32, 32|512)>
>>> s=hs.data.two_gaussians().as_lazy()>>> print(s.rebin(new_shape=[32,32,512]))<LazySignal1D, title: Two Gaussians, dimensions: (32, 32|512)>
On the other hand, the following rebinning operation requires interpolation and
cannot be performed lazily:
>>> s=hs.signals.Signal1D(np.ones([4,4,10]))>>> s.data[1,2,9]=5>>> print(s)<Signal1D, title: , dimensions: (4, 4|10)>>>> print('Sum = ',s.data.sum())Sum = 164.0>>> scale=[0.5,0.5,5]>>> test=s.rebin(scale=scale)>>> test2=s.rebin(new_shape=(8,8,2))# Equivalent to the above>>> print(test)<Signal1D, title: , dimensions: (8, 8|2)>>>> print(test2)<Signal1D, title: , dimensions: (8, 8|2)>>>> print('Sum =',test.data.sum())Sum = 164.0>>> print('Sum =',test2.data.sum())Sum = 164.0>>> s.as_lazy().rebin(scale=scale)Traceback (most recent call last):
File "<ipython-input-26-49bca19ebf34>", line 1, in <module>spectrum.as_lazy().rebin(scale=scale)
File "/home/fjd29/Python/hyperspy3/hyperspy/_signals/eds.py", line 184, in rebinm=super().rebin(new_shape=new_shape,scale=scale,crop=crop,out=out)
File "/home/fjd29/Python/hyperspy3/hyperspy/_signals/lazy.py", line 246, in rebin"Lazy rebin requires scale to be integer and divisor of the "NotImplementedError: Lazy rebin requires scale to be integer and divisor of the original signal shape
The dtype argument can be used to specify the dtype of the returned
signal:
By default dtype=None, the dtype is determined by the behaviour of
numpy.sum, in this case, unsigned integer of the same precision as the
platform interger:
The interpolate_on_axis() method makes it possible to
exchange any existing axis of a signal with a new axis,
regardless of the signals dimension or the axes types.
This is achieved by interpolating the data using scipy.interpolate.make_interp_spline()
from the old axis to the new axis. Replacing multiple axes can be done iteratively.
The squeeze() method removes any zero-dimensional
axes, i.e. axes of size=1, and the attributed data dimensions from a signal.
The method returns a reduced copy of the signal and does not operate in place.
When dealing with multidimensional datasets it is sometimes useful to transform
the data into a two dimensional dataset. This can be accomplished using the
following two methods:
When stacking signals with large amount of
original_metadata, these metadata will be
stacked and this can lead to very large amount of metadata which can in
turn slow down processing. The stack_original_metadata argument can be
used to disable stacking original_metadata.
An object can be split into several objects
with the split() method. This function can be used
to reverse the stack() function:
The fast Fourier transform
of a signal can be computed using the fft() method. By default,
the FFT is calculated with the origin at (0, 0), which will be displayed at the
bottom left and not in the centre of the FFT. Conveniently, the shift argument of the
the fft() method can be used to center the output of the FFT.
In the following example, the FFT of a hologram is computed using shift=True and its
output signal is displayed, which shows that the FFT results in a complex signal with a
real and an imaginary parts:
The strong features in the real and imaginary parts correspond to the lattice fringes of the
hologram.
For visual inspection of the FFT it is convenient to display its power spectrum
(i.e. the square of the absolute value of the FFT) rather than FFT itself as it is done
in the example above by using the power_spectum argument:
Where power_spectum is set to True since it is the first argument of the
plot() method for complex signal.
When power_spectrum=True, the plot will be displayed on a log scale by default.
The visualisation can be further improved by setting the minimum value to display to the 30-th
percentile; this can be done by using vmin="30th" in the plot function:
The streaks visible in the FFT come from the edge of the image and can be removed by
applying an apodization function to the original
signal before the computation of the FFT. This can be done using the apodization argument of
the fft() method and it is usually used for visualising FFT patterns
rather than for quantitative analyses. By default, the so-called hann windows is
used but different type of windows such as the hamming and tukey windows.
Inverse fast Fourier transform can be calculated from a complex signal by using the
ifft() method. Similarly to the fft() method,
the shift argument can be provided to shift the origin of the iFFT when necessary:
Even if the original data is recorded with a limited dynamic range, it is often
desirable to perform the analysis operations with a higher precision.
Conversely, if space is limited, storing in a shorter data type can decrease
the file size. The change_dtype() changes the data
type in place, e.g.:
>>> s=hs.load('EELS Signal1D Signal2D (high-loss).dm3') Title: EELS Signal1D Signal2D (high-loss).dm3 Signal type: EELS Data dimensions: (21, 42, 2048) Data representation: spectrum Data type: float32>>> s.change_dtype('float64')>>> print(s) Title: EELS Signal1D Signal2D (high-loss).dm3 Signal type: EELS Data dimensions: (21, 42, 2048) Data representation: spectrum Data type: float64
In addition to all standard numpy dtypes, HyperSpy supports four extra dtypes
for RGB images for visualization purposes only: rgb8, rgba8,
rgb16 and rgba16. This includes of course multi-dimensional RGB images.
The requirements for changing from and to any rgbx dtype are more strict
than for most other dtype conversions. To change to a rgbx dtype the
signal_dimension must be 1 and its size 3 (4) 3(4) for rgb (or
rgba) dtypes and the dtype must be uint8 (uint16) for
rgbx8 (rgbx16). After conversion the signal_dimension becomes 2.
Most operations on signals with RGB dtypes will fail. For processing simply
change their dtype to uint8 (uint16).The dtype of images of
dtype rgbx8 (rgbx16) can only be changed to uint8 (uint16) and
the signal_dimension becomes 1.
In the following example we create a 1D signal with signal size 3 and with
dtype uint16 and change its dtype to rgb16 for plotting.
transpose() method changes how the dataset
dimensions are interpreted (as signal or navigation axes). By default is
swaps the signal and navigation axes. For example:
The method accepts both explicit axes to keep in either space, or just a number
of axes required in one space (just one number can be specified, as the other
is defined as “all other axes”). When axes order is not explicitly defined,
they are “rolled” from one space to the other as if the <navigationaxes|signalaxes> wrap a circle. The example below should help clarifying this.
>>> # just create a signal with many distinct dimensions>>> s=hs.signals.BaseSignal(np.random.rand(1,2,3,4,5,6,7,8,9))>>> s<BaseSignal, title: , dimensions: (|9, 8, 7, 6, 5, 4, 3, 2, 1)>>>> s.transpose(signal_axes=5)# roll to leave 5 axes in signal space<BaseSignal, title: , dimensions: (4, 3, 2, 1|9, 8, 7, 6, 5)>>>> s.transpose(navigation_axes=3)# roll leave 3 axes in navigation space<BaseSignal, title: , dimensions: (3, 2, 1|9, 8, 7, 6, 5, 4)>>>> # 3 explicitly defined axes in signal space>>> s.transpose(signal_axes=[0,2,6])<BaseSignal, title: , dimensions: (8, 6, 5, 4, 2, 1|9, 7, 3)>>>> # A mix of two lists, but specifying all axes explicitly>>> # The order of axes is preserved in both lists>>> s.transpose(navigation_axes=[1,2,3,4,5,8],signal_axes=[0,6,7])<BaseSignal, title: , dimensions: (8, 7, 6, 5, 4, 1|9, 3, 2)>
A convenience functions transpose() is available to operate on
many signals at once, for example enabling plotting any-dimension signals
trivially:
The transpose() method accepts keyword argument
optimize, which is False by default, meaning modifying the output
signal data always modifies the original data i.e. the data is just a view
of the original data. If True, the method ensures the data in memory is
stored in the most efficient manner for iterating by making a copy of the data
if required, hence modifying the output signal data not always modifies the
original data.
The convenience methods as_signal1D() and
as_signal2D() internally use
transpose(), but always optimize the data
for iteration over the navigation axes if required. Hence, these methods do not
always return a view of the original data. If a copy of the data is required
use
deepcopy() on the output of any of these
methods e.g.:
Apodization window (also known as apodization function) can be applied to a signal
using apply_apodization() method. By default standard
Hann window is used:
Higher order Hann window can be used in order to keep larger fraction of intensity of original signal.
This can be done providing an integer number for the order of the window through
keyword argument hann_order. (The last one works only together with default value of window argument
or with window='hann'.)
In addition to Hann window also Hamming or Tukey windows can be applied using window attribute
selecting 'hamming' or 'tukey' respectively.
The shape of Tukey window can be adjusted using parameter alpha
provided through tukey_alpha keyword argument (only used when window='tukey').
The parameter represents the fraction of the window inside the cosine tapered region,
i.e. smaller is alpha larger is the middle flat region where the original signal
is preserved. If alpha is one, the Tukey window is equivalent to a Hann window.
(Default value is 0.5)
Apodization can be applied in place by setting keyword argument inplace to True.
In this case method will not return anything.
get_histogram() computes the histogram and
conveniently returns it as signal instance. It provides methods to
calculate the bins. print_summary_statistics()
prints the five-number summary statistics of the data.
These two methods can be combined with
get_current_signal() to compute the histogram or
print the summary statistics of the signal at the current coordinates, e.g:
Histogram of different objects can be compared with the functions
plot_histograms() (see
visualisation for the plotting options). For example,
with histograms of several random chi-square distributions:
Some data operations require the data variance. Those methods use the
metadata.Signal.Noise_properties.variance attribute if it exists. You can
set this attribute as in the following example where we set the variance to be
10:
For heteroscedastic noise the variance attribute must be a
BaseSignal. Poissonian noise is a common case of
heteroscedastic noise where the variance is equal to the expected value. The
estimate_poissonian_noise_variance()
method can help setting the variance of data with
semi-Poissonian noise. With the default arguments, this method simply sets the
variance attribute to the given expected_value. However, more generally
(although the noise is not strictly Poissonian), the variance may be
proportional to the expected value. Moreover, when the noise is a mixture of
white (Gaussian) and Poissonian noise, the variance is described by the
following linear model:
\[\mathrm{Var}[X] = (a * \mathrm{E}[X] + b) * c\]
Where a is the gain_factor, b is the gain_offset (the Gaussian
noise variance) and c the correlation_factor. The correlation
factor accounts for correlation of adjacent signal elements that can
be modelled as a convolution with a Gaussian point spread function.
estimate_poissonian_noise_variance() can be used to
set the noise properties when the variance can be described by this linear
model, for example:
>>> s=hs.signals.Signal1D(np.ones(100))>>> s.add_poissonian_noise()>>> s.metadata ├── General │ └── title = └── Signal └── signal_type =>>> s.estimate_poissonian_noise_variance()>>> s.metadata ├── General │ └── title = └── Signal ├── Noise_properties │ ├── Variance_linear_model │ │ ├── correlation_factor = 1 │ │ ├── gain_factor = 1 │ │ └── gain_offset = 0 │ └── variance = <BaseSignal, title: Variance of , dimensions: (|100)> └── signal_type =
Many signal methods create and return a new signal. For fast operations, the
new signal creation time is non-negligible. Also, when the operation is
repeated many times, for example in a loop, the cumulative creation time can
become significant. Therefore, many operations on
BaseSignal accept an optional argument out. If an
existing signal is passed to out, the function output will be placed into
that signal, instead of being returned in a new signal. The following example
shows how to use this feature to slice a BaseSignal. It is
important to know that the BaseSignal instance passed in
the out argument must be well-suited for the purpose. Often this means that
it must have the same axes and data shape as the
BaseSignal that would normally be returned by the
operation.
The HyperSpy ComplexSignal signal class
and its subclasses for 1-dimensional and 2-dimensional data allow the user to
access complex properties like the real and imag parts of the data or the
amplitude (also known as the modulus) and phase (also known as angle or
argument) directly. Getting and setting those properties can be done as
follows:
>>> s=hs.signals.ComplexSignal1D(np.arange(100)+1j*np.arange(100))>>> real=s.real# real is a new HS signal accessing the same data>>> s.real=np.random.random(100)# new_real can be an array or signal>>> imag=s.imag# imag is a new HS signal accessing the same data>>> s.imag=np.random.random(100)# new_imag can be an array or signal
It is important to note that data passed to the constructor of a
ComplexSignal (or to a subclass), which
is not already complex, will be converted to the numpy standard of
np.complex/np.complex128. data which is already complex will be passed
as is.
To transform a real signal into a complex one use:
>>> s.change_dtype(complex)
Changing the dtype of a complex signal to something real is not clearly
defined and thus not directly possible. Use the real, imag,
amplitude or phase properties instead to extract the real data that is
desired.
The angle() function
can be used to calculate the angle, which is equivalent to using the phase
property if no argument is used. If the data is real, the angle will be 0 for
positive values and 2$pi$ for negative values. If the deg parameter is set
to True, the result will be given in degrees, otherwise in rad (default).
The underlying function is the numpy.angle() function.
angle() will return
an appropriate HyperSpy signal.
Sometimes it is convenient to visualize a complex signal as a plot of its
imaginary part versus real one. In this case so called Argand diagrams can
be calculated using argand_diagram()
method, which returns the plot as a Signal2D.
Optional arguments size and display_range can be used to change the
size (and therefore resolution) of the plot and to change the range for the
display of the plot respectively. The last one is especially useful in order to
zoom into specific regions of the plot or to limit the plot in case of noisy
data points.
An example of calculation of Aragand diagram is holospy:ref:shown for electron
holography data <holo.argand-example>.
For 2-dimensional complex images, a linear phase ramp can be added to the
signal via the
add_phase_ramp() method.
The parameters ramp_x and ramp_y dictate the slope of the ramp in x-
and y direction, while the offset is determined by the offset parameter.
The fulcrum of the linear ramp is at the origin and the slopes are given in
units of the axis with the according scale taken into account. Both are
available via the AxesManager of the signal.
GPU processing is supported thanks to the numpy dispatch mechanism of array functions
- read NEP-18
and NEP-35
for more information. It means that most HyperSpy functions will work on a GPU
if the data is a cupy.ndarray and the required functions are
implemented in cupy.
Note
GPU processing with hyperspy requires numpy>=1.20 and dask>=2021.3.0, to be
able to use NEP-18 and NEP-35.
>>> importcupyascp>>> # Create a cupy array (on GPU device)>>> data=cp.random.random(size=(20,20,100,100))>>> s=hs.signals.Signal2D(data)>>> type(s.data)... cupy._core.core.ndarray
Two convenience methods are available to transfer data between the host and
the (GPU) device memory:
HyperSpy distinguishes between signal and navigation axes and most
functions operate on the signal axes and iterate over the navigation axes.
Take an EELS spectrum image as specific example. It is a 2D array of spectra
and has three dimensions: X, Y and energy-loss. In HyperSpy, X and Y are the
navigation dimensions and the energy-loss is the signal dimension. To make
this distinction more explicit, the representation of the object includes a
separator | between the navigation and signal dimensions. In analogy, the
signal dimension in EDX would be the X-ray energy, in optical spectra the
wavelength axis, etc. However, HyperSpy can also handle data with more than one
signal dimension, such as a stack or even map of diffraction images or
electron-holograms in TEM.
For example: A spectrum image has signal dimension 1 and navigation dimension 2
and is stored in the Signal1D subclass.
The axes are managed and stored by the AxesManager class
that is stored in the axes_manager attribute of
the signal class. The individual axes can be accessed by indexing the
AxesManager, e.g.:
Alternatively, the “current position” can be changed programmatically by
directly accessing the indices attribute of a signal’s
AxesManager or the index attribute of an individual
axis. This is particularly useful when trying to set
a specific location at which to initialize a model’s parameters to
sensible values before performing a fit over an entire spectrum image. The
indices must be provided as a tuple, with the same length as the number of
navigation dimensions:
name (str) and units (str) are basic parameters describing an axis
used in plotting. The latter enables the conversion of units.
navigate (bool) determines, whether it is a navigation axis.
size (int) gives the number of elements in an axis.
index (int) determines the “current position for a navigation axis and
value (float) returns the value at this position.
low_index (int) and high_index (int) are the first and last index.
low_value (int) and high_value (int) are the smallest and largest
value.
The axis array stores the values of the axis points. However,
depending on the type of axis, this array may be updated from the defining
attributes as discussed in the following section.
UniformDataAxis defined by the initial value offset
and spacing scale.
The main disambiguation is whether the
axis is uniform, where the data points are equidistantly spaced, or
non-uniform, where the spacing may vary. The latter can become important
when, e.g., a spectrum recorded over a wavelength axis is converted to a
wavenumber or energy scale, where the conversion is based on a 1/x
dependence so that the axis spacing of the new axis varies along the length
of the axis. Whether an axis is uniform or not can be queried through the
property is_uniform (bool) of the axis.
Every axis of a signal object may be of a different type. For example, it is
common that the navigation axes would be uniform, while the signal axes
are non-uniform.
When an axis is created, the type is automatically determined by the attributes
passed to the generator. The three different axis types are summarized in the
following table.
Not all features are implemented for non-uniform axes.
Warning
Non-uniform axes are in beta state and its API may change in a minor release.
Not all hyperspy features are compatible with non-uniform axes and support
will be added in future releases.
The most common case is the UniformDataAxis. Here, the axis
is defined by the offset, scale and size parameters, which determine
the initial value, spacing and length, respectively. The actual axis
array is automatically calculated from these three values. The UniformDataAxis
is a special case of the FunctionalDataAxis defined by the function
scale*x+offset.
Alternatively, a FunctionalDataAxis is defined based on an
expression that is evaluated to yield the axis points. The expression
is a function defined as a string using the
SymPy text expression
format. An example would be expression=a/x+b. Any variables in the
expression, in this case a and b must be defined as additional
attributes of the axis. The property is_uniform is automatically set to
False.
x itself is an instance of BaseDataAxis. By default,
it will be a UniformDataAxis with offset=0 and
scale=1 of the given size. However, it can also be initialized with
custom offset and scale values. Alternatively, it can be a non
uniform DataAxis.
A DataAxis is the most flexible type of axis. The axis
points are directly given by an array named axis. As this can be any
array, the property is_uniform is automatically set to False.
Usually, the axes are directly added to a signal during signal
initialization. However, you may wish to add/remove
axes from the AxesManager of a signal.
Note that there is currently no consistency check whether a signal object has
the right number of axes of the right dimensions. Most functions will however
fail if you pass a signal object where the axes do not match the data
dimensions and shape.
You can add a set of axes to the AxesManager by passing either a list of
axes dictionaries to axes_manager.create_axes():
Internally, HyperSpy uses the pint library to
manage the scale and offset quantities. The scale_as_quantity and
offset_as_quantity attributes return pint object:
>>> q=s.axes_manager[0].offset_as_quantity>>> type(q)# q is a pint quantity object<class 'pint.Quantity'>>>> q<Quantity(2.5, 'nanometer')>
The convert_units method of the AxesManager converts
units, which by default (no parameters provided) converts all axis units to an
optimal unit to avoid using too large or small numbers.
Each axis can also be converted individually using the convert_to_units
method of the UniformDataAxis:
Note that HyperSpy rearranges the axes when compared to the array order. The
following few paragraphs explain how and why.
Depending on how the array is arranged, some axes are faster to iterate than
others. Consider an example of a book as the dataset in question. It is
trivially simple to look at letters in a line, and then lines down the page,
and finally pages in the whole book. However, if your words are written
vertically, it can be inconvenient to read top-down (the lines are still
horizontal, it’s just the meaning that’s vertical!). It is very time-consuming
if every letter is on a different page, and for every word you have to turn 5-6
pages. Exactly the same idea applies here - in order to iterate through the
data (most often for plotting, but for any other operation as well), you
want to keep it ordered for “fast access”.
In Python (more explicitly numpy), the “fast axes order” is C order (also
called row-major order). This means that the last axis of a numpy array is
fastest to iterate over (i.e. the lines in the book). An alternative ordering
convention is F order (column-major), where it is the other way round: the
first axis of an array is the fastest to iterate over. In both cases, the
further an axis is from the fast axis the slower it is to iterate over this
axis. In the book analogy, you could think about reading the first lines of
all pages, then the second and so on.
When data is acquired sequentially, it is usually stored in acquisition order.
When a dataset is loaded, HyperSpy generally stores it in memory in the same
order, which is good for the computer. However, HyperSpy will reorder and
classify the axes to make it easier for humans. Let’s imagine a single numpy
array that contains pictures of a scene acquired with different exposure times
on different days. In numpy, the array dimensions are (D,E,Y,X). This
order makes it fast to iterate over the images in the order in which they were
acquired. From a human point of view, this dataset is just a collection of
images, so HyperSpy first classifies the image axes (X and Y) as
signal axes and the remaining axes the navigation axes. Then it reverses
the order of each set of axes because many humans are used to get the X
axis first and, more generally, the axes in acquisition order from left to
right. So, the same axes in HyperSpy are displayed like this: (E,D|X,Y).
Extending this to arbitrary dimensions, by default, we reverse the numpy axes,
chop them into two chunks (signal and navigation), and then swap those chunks,
at least when printing. As an example:
(a1,a2,a3,a4,a5,a6)# original (numpy)(a6,a5,a4,a3,a2,a1)# reverse(a6,a5)(a4,a3,a2,a1)# chop(a4,a3,a2,a1)(a6,a5)# swap (HyperSpy)
In the background, HyperSpy also takes care of storing the data in memory in
a “machine-friendly” way, so that iterating over the navigation axes is always
fast.
One can iterate over the AxesManager to produce indices to
the navigation axes. Each iteration will yield a new tuple of indices, sorted
according to the iteration path specified in iterpath.
Setting the indices property to a new index will
update the accompanying signal so that signal methods that operate at a specific
navigation index will now use that index, like s.plot().
The iterpath attribute specifies the strategy that
the AxesManager should use to iterate over the navigation axes.
Two built-in strategies exist:
'serpentine' (default): starts at (0, 0), but when it reaches the final column
(of index N), it continues from (1, N) along the next row, in the same way
that a snake might slither, left and right.
'flyback': starts at (0, 0), continues down the row until the final
column, “flies back” to the first column, and continues from (1, 0).
The iterpath can also be set to be a specific list of indices, like [(0,0), (0,1)],
but can also be any generator of indices. Storing a high-dimensional set of
indices as a list or array can take a significant amount of memory. By using a
generator instead, one almost entirely removes such a memory footprint:
Since generators do not have a defined length, and does not need to include all
navigation indices, a progressbar will be unable to determine how long it needs
to be. To resolve this, a helper class can be imported that takes both a generator
and a manually specified length as inputs:
The crop_signal() crops the
the signal object along the signal axis (e.g. the spectral energy range)
in-place. If no parameter is passed, a user interface
appears in which to crop the one dimensional signal. For example:
>>> s=hs.data.two_gaussians()>>> s.crop_signal(5,15)# s is cropped in place
Additionally, cropping in HyperSpy can be performed using the Signal
indexing syntax. For example, the following crops a signal
to the 5.0-15.0 region:
>>> s=hs.data.two_gaussians()>>> sc=s.isig[5.:15.]# s is not cropped, sc is a "cropped view" of s
Added in version 1.4: zero_fill and plot_remainder keyword arguments and big speed
improvements.
The remove_background() method provides
background removal capabilities through both a CLI and a GUI. The GUI displays
an interactive preview of the remainder after background subtraction. Currently,
the following background types are supported: Doniach, Exponential, Gaussian,
Lorentzian, Polynomial, Power law (default), Offset, Skew normal, Split Voigt
and Voigt. By default, the background parameters are estimated using analytical
approximations (keyword argument fast=True). The fast option is not accurate
for most background types - except Gaussian, Offset and Power law -
but it is useful to estimate the initial fitting parameters before performing a
full fit. For better accuracy, but higher processing time, the parameters can
be estimated using curve fitting by setting fast=False.
Interactive background removal. In order to select the region
used to estimate the background parameters (red area in the
figure) click inside the axes of the figure and drag to the right
without releasing the button.#
The following methods use sub-pixel cross-correlation or user-provided shifts
to align spectra. They support applying the same transformation to multiple
files.
To integrate signals use the integrate1D() method.
Possibly in combination with a Region Of Interest (ROI) if interactivity is required.
Otherwise, a signal subrange for integration can also be chosen with the
isig method.
spikes_removal_tool() provides an user
interface to remove spikes from spectra. The derivativehistogram allows to
identify the appropriate threshold. It is possible to use this tool
on a specific interval of the data by slicing the data. For example, to use this tool in the signal between
indices 8 and 17:
The options navigation_mask or signal_mask provide more flexibility in the
selection of the data, but these require a mask (booleen array) as parameter, which needs
to be created manually:
>>> s=hs.signals.Signal1D(np.arange(5*10*20).reshape((5,10,20)))To get a signal mask, get the mean over the navigation space>>> s_mean=s.mean()>>> mask=s_mean>495>>> s.spikes_removal_tool(signal_mask=mask)
For asymmetric peaks, fitted functions <model.fitting> may not provide
an accurate description of the peak, in particular the peak width. The function
estimate_peak_width()
determines the width of a peak at a certain fraction of its maximum value.
# Estimate shifts, then align the images>>>shifts=s.estimate_shift2D()# doctest: +SKIP>>>s.align2D(shifts=shifts)# doctest: +SKIP# Estimate and align in a single step>>>s.align2D()# doctest: +SKIP
Warning
s.align2D() will modify the data in-place. If you don’t want
to modify your original data, first take a copy before aligning.
Sub-pixel accuracy can be achieved in two ways:
scikit-image’s upsampled matrix-multiplication DFT method
[Guizar2008], by setting the sub_pixel_factor
keyword argument
for multi-dimensional datasets only, using the statistical
method [Schaffer2004], by setting the reference
keyword argument to "stat"
# skimage upsampling method>>>shifts=s.estimate_shift2D(sub_pixel_factor=20)# doctest: +SKIP# stat method>>>shifts=s.estimate_shift2D(reference="stat")# doctest: +SKIP# combined upsampling and statistical method>>>shifts=s.estimate_shift2D(reference="stat",sub_pixel_factor=20)# doctest: +SKIP
If you have a large stack of images, the image alignment is automatically done in
parallel.
You can control the number of threads used with the num_workers argument. Or by adjusting
the scheduler of the dask backend.
# Estimate shifts>>>shifts=s.estimate_shift2D()# doctest: +SKIP# Align images in parallel using 4 threads>>>s.align2D(shifts=shifts,num_workers=4)# doctest: +SKIP
A linear ramp can be added to the signal via the
add_ramp() method. The parameters
ramp_x and ramp_y dictate the slope of the ramp in x- and y direction,
while the offset is determined by the offset parameter. The fulcrum of the
linear ramp is at the origin and the slopes are given in units of the axis
with the according scale taken into account. Both are available via the
AxesManager of the signal.
These methods search for peaks using maximum (and minimum) values in the
image. There all have a distance parameter to set the minimum distance
between the peaks.
the 'local_max' method uses the skimage.feature.peak_local_max()
function (distance and threshold parameters are mapped to
min_distance and threshold_abs, respectively).
the 'max' method uses the
find_peaks_max() function to search
for peaks higher than alpha*sigma, where alpha is parameters and
sigma is the standard deviation of the image. It also has a distance
parameters to set the minimum distance between peaks.
the 'minmax' method uses the
find_peaks_minmax() function to locate
the positive peaks in an image by comparing maximum and minimum filtered
images. Its threshold parameter defines the minimum difference between
the maximum and minimum filtered images.
This algorithm was developed by Zaefferer [Zaefferer2000].
It is based on a gradient threshold followed by a local maximum search within a square window,
which is moved until it is centered on the brightest point, which is taken as a
peak if it is within a certain distance of the starting point. It uses the
find_peaks_zaefferer() function, which can take
grad_threshold, window_size and distance_cutoff as parameters. See
the find_peaks_zaefferer() function documentation
for more details.
Described by White [White2009], this method is based on finding points that
have a statistically higher value than the surrounding areas, then iterating
between smoothing and binarising until the number of peaks has converged. This
method can be slower than the others, but is very robust to a variety of image types.
It uses the find_peaks_stat() function, which can take
alpha, window_radius and convergence_ratio as parameters. See the
find_peaks_stat() function documentation for more
details.
These methods are essentially wrappers around the
Laplacian of Gaussian (skimage.feature.blob_log()) or the difference
of Gaussian (skimage.feature.blob_dog()) methods, based on stacking
the Laplacian/difference of images convolved with Gaussian kernels of various
standard deviations. For more information, see the example in the
scikit-image documentation.
This method locates peaks in the cross correlation between the image and a
template using the find_peaks_xc() function. See
the find_peaks_xc() function documentation for
more details.
Many of the peak finding algorithms implemented here have a number of tunable
parameters that significantly affect their accuracy and speed. The GUIs can be
used to set to select the method and set the parameters interactively:
>>> s.find_peaks(interactive=True)
Several widgets are available:
The method selector is used to compare different methods. The last-set
parameters are maintained.
The parameter adjusters will update the parameters of the method and re-plot
the new peaks.
Note
Some methods take significantly longer than others, particularly
where there are a large number of peaks to be found. The plotting window
may be inactive during this time.
The object returned by load(), a BaseSignal
instance, has a plot() method that is powerful and
flexible to visualize n-dimensional data. In this chapter, the
visualisation of multidimensional data is exemplified with two experimental
datasets: an EELS spectrum image and an EDX dataset consisting of a secondary
electron emission image stack and a 3D hyperspectral image, both simultaneously
acquired by recording two signals in parallel in a FIB/SEM.
>>> s=hs.load('YourDataFilenameHere')>>> s.plot()
if the object is single spectrum or an image one window will appear when
calling the plot method.
If the object is a 1D or 2D spectrum-image (i.e. with 2 or 3 dimensions when
including energy) two figures will appear, one containing a plot of the
spectrum at the current coordinates and the other an image of the data summed
over its spectral dimension if 2D or an image with the spectral dimension in
the x-axis if 1D:
Added in version 1.4: Customizable keyboard shortcuts to navigate multi-dimensional datasets.
To change the current coordinates, click on the pointer (which will be a line
or a square depending on the dimensions of the data) and drag it around. It is
also possible to move the pointer by using the numpad arrows when numlock is
on and the spectrum or navigator figure is selected. When using the numpad
arrows the PageUp and PageDown keys change the size of the step.
The current coordinates can be either set by navigating the
plot(), or specified by pixel indices
in s.axes_manager.indices or as calibrated coordinates in
s.axes_manager.coordinates.
An extra cursor can be added by pressing the e key. Pressing e once
more will disable the extra cursor:
In matplotlib, left and right arrow keys are by default set to navigate the
“zoom” history. To avoid the problem of changing zoom while navigating,
Ctrl + arrows can be used instead. Navigating without using the modifier keys
will be deprecated in version 2.0.
To navigate navigation dimensions larger than 2, modifier keys can be used.
The defaults are Shift + left/right and Shift + up/down,
(Alt + left/right and Alt + up/down)
for navigating dimensions 2 and 3 (4 and 5) respectively. Modifier keys do not work with the numpad.
Hotkeys and modifier keys for navigating the plot can be set in the
HyperSpy plot preferences.
Note that some combinations will not work for all platforms, as some systems reserve them for
other purposes.
If you want to jump to some point in the dataset. In that case you can hold the Shift key
and click the point you are interested in. That will automatically take you to that point in the
data. This also helps with lazy data as you don’t have to load every chunk in between.
Visualisation of a 2D spectrum image using two pointers.#
Sometimes the default size of the rectangular cursors used to navigate images
can be too small to be dragged or even seen. It
is possible to change the size of the cursors by pressing the + and -
keys when the navigator window is selected.
The following keyboard shortcuts are available when the 1D signal figure is in focus:
Keyboard shortcuts available on the signal figure of 1D signal data#
key
function
e
Switch second pointer on/off
Ctrl + Arrows
Change coordinates for dimensions 0 and 1 (typically x and y)
Shift + Arrows
Change coordinates for dimensions 2 and 3
Alt + Arrows
Change coordinates for dimensions 4 and 5
PageUp
Increase step size
PageDown
Decrease step size
+
Increase pointer size when the navigator is an image
-
Decrease pointer size when the navigator is an image
l
switch the scale of the y-axis between logarithmic and linear
To close all the figures run the following command:
plt.close('all') is a matplotlib command.
Matplotlib is the library that HyperSpy uses to produce the plots. You can
learn how to pan/zoom and more in the matplotlib documentation
Note
Plotting float16 images is currently not supported by matplotlib; however, it is
possible to convert the type of the data by using the
change_dtype() method, e.g. s.change_dtype('float32').
Equivalently, if the object is a 1D or 2D image stack two figures will appear,
one containing a plot of the image at the current coordinates and the other
a spectrum or an image obtained by summing over the image dimensions:
The image plot can be customised by passing additional arguments when plotting.
Colorbar, scalebar and contrast controls are HyperSpy-specific, however
matplotlib.axes.Axes.imshow() arguments are supported as well:
Custom colormap and switched off scalebar in an image.#
Added in version 1.4: norm keyword argument
The norm keyword argument can be used to select between linear, logarithmic
or custom (using a matplotlib norm) intensity scale. The default, “auto”,
automatically selects a logarithmic scale when plotting a power spectrum.
Added in version 1.6: autoscale keyword argument
The autoscale keyword argument can be used to specify which axis limits are
reset when the data or navigation indices change. It can take any combinations
of the following characters:
'x': to reset the horizontal axes
'y': to reset the vertical axes
'v': to reset the contrast of the image according to vmin and vmax
By default (autoscale='v'), only the contrast of the image will be reset
automatically. For example, to reset the extent of the image (x and y) to their
maxima but not the contrast, use autoscale='xy'; To reset all limits,
including the contrast of the image, use autoscale='xyv':
When plotting using divergent colormaps, if centre_colormap is True
(default) the contrast is automatically adjusted so that zero corresponds to
the center of the colormap (usually white). This can be useful e.g. when
displaying images that contain pixels with both positive and negative values.
The following example shows the effect of centring the color map:
Divergent color map with centre_colormap disabled.#
Added in version 2.0.0: plot_style keyword argument to allow for “horizontal” or “vertical” alignment of subplots (e.g. navigator
and signal) when using the ipympl or widget backends. A default value can also be set using the
HyperSpy plot preferences.
Added in version 1.1.2: Passing keyword arguments to the navigator plot.
The navigator can be customised by using the navigator_kwds argument. For
example, in case of a image navigator, all image plot arguments mentioned in
the section Customising image plot can be passed as a dictionary to the
navigator_kwds argument:
An external signal (e.g. a spectrum) can be used as a navigator, for example
the “maximum spectrum” for which each channel is the maximum of all pixels.
plot_images() is used to plot several images in the
same figure. It supports many configurations and has many options available
to customize the resulting output. The function returns a list of
matplotlib.axes.Axes,
which can be used to further customize the figure. Some examples are given
below. Plots generated from another installation may look slightly different
due to matplotlib GUI backends and default font sizes. To change the
font size globally, use the command matplotlib.rcParams.update({'font.size':8}).
Added in version 1.5: Add support for plotting BaseSignal with navigation
dimension 2 and signal dimension 0.
A common usage for plot_images() is to view the
different slices of a multidimensional image (a hyperimage):
By default, plot_images() will attempt to auto-label
the images based on the Signal titles. The labels (and title) can be
customized with the suptitle and label arguments. In this example, the
axes labels and the ticks are also disabled with axes_decor:
Figure generated with plot_images() with customised
labels.#
plot_images() can also be used to easily plot a list
of Images, comparing different Signals, including RGB images. This
example also demonstrates how to wrap labels using labelwrap (for preventing
overlap) and using a single colorbar for all the Images, as opposed to
multiple individual ones:
>>> importscipy>>> importnumpyasnp>>>Load red channel of raccoon as an image>>> image0=hs.signals.Signal2D(scipy.datasets.face()[:,:,0])>>> image0.metadata.General.title='Rocky Raccoon - R'Load ascent into a length 6 hyper-image>>> image1=hs.signals.Signal2D([scipy.datasets.ascent()]*6)>>> angles=hs.signals.BaseSignal(np.arange(10,70,10)).T>>> image1.map(scipy.ndimage.rotate,angle=angles,reshape=False)>>> image1.data=np.clip(image1.data,0,255)# clip data to int rangeLoad green channel of raccoon as an image>>> image2=hs.signals.Signal2D(scipy.datasets.face()[:,:,1])>>> image2.metadata.General.title='Rocky Raccoon - G'>>>Load rgb image of the raccoon>>> rgb=hs.signals.Signal1D(scipy.datasets.face())>>> rgb.change_dtype("rgb8")>>> rgb.metadata.General.title='Raccoon - RGB'>>>>>> images=[image0,image1,image2,rgb]>>> foriminimages:... ax=im.axes_manager.signal_axes... ax[0].name,ax[1].name='x','y'... ax[0].units,ax[1].units='mm','mm'>>> hs.plot.plot_images(images,tight_layout=True,... colorbar='single',labelwrap=20)
Figure generated with plot_images() from a list of
images.#
Data files used in the following example can be downloaded using (These data
are described in [Rossouw2015].
>>> #Download the data (1MB)>>> fromurllib.requestimporturlretrieve,urlopen>>> fromzipfileimportZipFile>>> files=urlretrieve("https://www.dropbox.com/s/ecdlgwxjq04m5mx/"... "HyperSpy_demos_EDS_TEM_files.zip?raw=1",... "./HyperSpy_demos_EDX_TEM_files.zip")>>> withZipFile("HyperSpy_demos_EDX_TEM_files.zip")asz:... z.extractall()
Another example for this function is plotting EDS line intensities see
EDS chapter. One can use the following commands
to get a representative figure of the X-ray line intensities of an EDS
spectrum image. This example also demonstrates changing the colormap (with
cmap), adding scalebars to the plots (with scalebar), and changing the
padding between the images. The padding is specified as a dictionary,
which is passed to matplotlib.figure.Figure.subplots_adjust().
This padding can also be changed interactively by clicking on the
button in the GUI (button may be different when using
different graphical backends).
Finally, the cmap option of plot_images()
supports iterable types, allowing the user to specify different colormaps
for the different images that are plotted by providing a list or other
generator:
The cmap argument can also be given as 'mpl_colors', and as a result,
the images will be plotted with colormaps generated from the default
matplotlib colors, which is very helpful when plotting multiple spectral
signals and their relative intensities (such as the results of a
decomposition() analysis). This example uses
plot_spectra(), which is explained in the
next section.
>>> si_EDS=hs.load("core_shell.hdf5")>>> si_EDS.change_dtype('float')>>> si_EDS.decomposition(True,algorithm='NMF',output_dimension=3)>>> factors=si_EDS.get_decomposition_factors()>>>>>> # the first factor is a very strong carbon background component, so we>>> # normalize factor intensities for easier qualitative comparison>>> forfinfactors:... f.data/=f.data.max()>>>>>> loadings=si_EDS.get_decomposition_loadings()>>>>>> hs.plot.plot_spectra(factors.isig[:14.0],style='cascade',padding=-1)>>>>>> # add some lines to nicely label the peak positions>>> plt.axvline(6.403,c='C2',ls=':',lw=0.5)>>> plt.text(x=6.503,y=0.85,s='Fe-K$_\\alpha$',color='C2')>>> plt.axvline(9.441,c='C1',ls=':',lw=0.5)>>> plt.text(x=9.541,y=0.85,s='Pt-L$_\\alpha$',color='C1')>>> plt.axvline(2.046,c='C1',ls=':',lw=0.5)>>> plt.text(x=2.146,y=0.85,s='Pt-M',color='C1')>>> plt.axvline(8.040,ymax=0.8,c='k',ls=':',lw=0.5)>>> plt.text(x=8.14,y=0.35,s='Cu-K$_\\alpha$',color='k')>>>>>> hs.plot.plot_images(loadings,cmap='mpl_colors',... axes_decor='off',per_row=1,... label=['Background','Pt core','Fe shell'],... scalebar=[0],scalebar_color='white',... padding={'top':0.95,'bottom':0.05,... 'left':0.05,'right':0.78})
Using plot_images() with cmap='mpl_colors'
together with plot_spectra() to visualize the
output of a non-negative matrix factorization of the EDS data.#
Note
Because it does not make sense, it is not allowed to use a list or
other iterable type for the cmap argument together with 'single'
for the colorbar argument. Such an input will cause a warning and
instead set the colorbar argument to None.
It is also possible to plot multiple images overlayed on the same figure by
passing the argument overlay=True to the
plot_images() function. This should only be done when
images have the same scale (eg. for elemental maps from the same dataset).
Using the same data as above, the Fe and Pt signals can be plotted using
different colours. Any color can be input via matplotlib color characters or
hex values.
plot_spectra() is used to plot several spectra in the
same figure. It supports different styles, the default
being “overlap”.
Added in version 1.5: Add support for plotting BaseSignal with navigation
dimension 1 and signal dimension 0.
In the following example we create a list of 9 single spectra (gaussian
functions with different sigma values) and plot them in the same figure using
plot_spectra(). Note that, in this case, the legend
labels are taken from the individual spectrum titles. By clicking on the
legended line, a spectrum can be toggled on and off.
Figure generated by plot_spectra() using the
overlap style.#
Another style, “cascade”, can be useful when “overlap” results in a plot that
is too cluttered e.g. to visualize
changes in EELS fine structure over a line scan. The following example
shows how to plot a cascade style figure from a spectrum, and save it in
a file:
Figure generated by plot_spectra() using the
cascade style.#
The “cascade” style has a padding option. The default value, 1, keeps the
individual plots from overlapping. However in most cases a lower
padding value can be used, to get tighter plots.
Using the color argument one can assign a color to all the spectra, or specific
colors for each spectrum. In the same way, one can also assign the line style
and provide the legend labels:
A simple extension of this functionality is to customize the colormap that
is used to generate the list of colors. Using a list comprehension, one can
generate a list of colors that follows a certain colormap:
Figure generated by plot_spectra() using the
heatmap style showing how to customise the color map.#
Any parameter that can be passed to matplotlib.pyplot.figure can also be used
with plot_spectra() to allow further customization (when using the
“overlap”, “cascade”, or “mosaic” styles). In the following example, dpi,
facecolor, frameon, and num are all parameters that are passed
directly to matplotlib.pyplot.figure as keyword arguments:
Customising the figure by setting the matplotlib axes properties.#
A matplotlib ax and fig object can also be specified, which can be used to
put several subplots in the same figure. This will only work for “cascade”
and “overlap” styles:
plot_signals() is used to plot several signals at the
same time. By default the navigation position of the signals will be synced,
and the signals must have the same dimensions. To plot two spectra at the
same time:
The plot_signals() plots several signals with
optional synchronized navigation.#
The navigator can be specified by using the navigator argument, where the
different options are “auto”, None, “spectrum”, “slider” or Signal.
For more details about the different navigators,
see the navigator options.
To specify the navigator:
Navigators can also be set differently for different plots using the
navigator_list argument. Where the navigator_list be the same length
as the number of signals plotted, and only contain valid navigator options.
For example:
Customising the navigator in plot_signals() by
providing a navigator list.#
Several signals can also be plotted without syncing the navigation by using
sync=False. The navigator_list can still be used to specify a navigator for
each plot:
HyperSpy provides an easy access the collections classes of matplotlib. These markers provide
powerful ways to annotate high dimensional datasets easily.
By providing an array of positions, the marker can also change position when
navigating the signal. In the following example, the local maxima are displayed
for each R, G and B channel of a colour image.
Markers can be added to the navigator as well. In the following example,
each slice of a 2D spectrum is tagged with a text marker on the signal plot.
Each slice is indicated with the same text on the navigator.
If the signal has a navigation dimension, the markers can be made to change
as a function of the navigation index by passing in kwargs with dtype=object.
For a signal with 1 navigation axis:
The optional parameters (**kwargs, keyword arguments) can be used for extra parameters used for
each matplotlib collection. Any parameter which can be set using the matplotlib.collections.Collection.set()
method can be used as an iterating parameter with respect to the navigation index by passing in a numpy array
with dtype=object. Otherwise to set the parameter globally the kwarg can directly be passed.
Additionally, if some **kwargs are shorter in length to some other parameter it will be cycled such that
Many times when annotating 1-D Plots, you want to add markers which are relative to the data. For example,
you may want to add a line which goes from [0, y], where y is the value at x. To do this, you can set the
offset_transform and/or transform to "relative".
Adding new types of markers to hyperspy is relatively simple. Currently, hyperspy supports any
matplotlib.collections.Collection object. For most common cases this should be sufficient,
as matplotlib has a large number of built-in collections beyond what is available directly in hyperspy.
In the event that you want a specific shape that is not supported, you can define a custom
matplotlib.path.Path object and then use the matplotlib.collections.PathCollection
to add the markers to the plot. Currently, there is no support for saving Path based markers but that can
be added if there are specific use cases.
Added in version 2.0: Marker Collections for faster plotting of many markers
Hyperspy’s Markers class and its subclasses extends the capabilities of the
matplotlib.collections.Collection class and subclasses.
Primarily it allows dynamic markers to be initialized by passing key word arguments with dtype=object. Those
attributes are then updated with the plot as you navigate through the plot.
In most cases the offsets kwarg is used to map some marker to multiple positions in the plot. For example we can
define a plot of Ellipses using:
Alternatively, if we want to make ellipses with different heights and widths we can pass multiple values to
heights, widths and angles. In general these properties will be applied such that prop[i%len(prop)] so
passing heights=(.1,.2,.3) will result in the ellipse at offsets[0] with a height of 0.1 the ellipse at
offsets[1] with a height of 0.1, ellipse at offsets[2] has a height of 0.3 and the ellipse at offsets[3] has
a height of 0.1 and so on.
For attributes which we want to by dynamic and change with the navigation coordinates, we can pass those values as
an array with dtype=object. Each of those values will be set as the index changes, similarly to signal data,
the current index of the axis manager is used to retrieve the current array of
markers at that index. Additionally, lazy markers are treated similarly where the current
chunk for a marker is cached.
Markers operates in a similar way to signals when the data is
retrieved. The current index for the signal is used to retrieve the current array of
markers at that index. Additionally, lazy markers are treated similarly where the current
chunk for a marker is cached.
If we want to plot a series of points, we can use the following code, in this case
both the sizes and offsets kwargs are dynamic and change with each index.
The Markers also has a class method from_signal() which can
be used to create a set of markers from the output of some map function. In this case signal.data is mapped
to some key and used to initialize a Markers object. If the signal has the attribute
signal.metadata.Peaks.signal_axes and convert_units = True then the values will be converted to the proper units
before creating the Markers object.
Note
For kwargs like size, height, etc. the scale and the units of the x axis are used to plot.
Let’s consider how plotting a bunch of different collections might look:
HyperSpy provides easy access to several “machine learning” algorithms that
can be useful when analysing multi-dimensional data. In particular,
decomposition algorithms, such as principal component analysis (PCA), or
blind source separation (BSS) algorithms, such as independent component
analysis (ICA), are available through the methods described in this section.
Hint
HyperSpy will decompose a dataset, \(X\), into two new datasets:
one with the dimension of the signal space known as factors (\(A\)),
and the other with the dimension of the navigation space known as loadings
(\(B\)), such that \(X = A B^T\).
For some of the algorithms listed below, the decomposition results in
an approximation of the dataset, i.e. \(X \approx A B^T\).
Decomposition techniques are most commonly used as a means of noise
reduction (or denoising) and dimensionality reduction. To apply a
decomposition to your dataset, run the decomposition()
method, for example:
>>> s=hs.signals.Signal1D(np.random.randn(10,10,200))>>> s.decomposition()Decomposition info: normalize_poissonian_noise=False algorithm=SVD output_dimension=None centre=None>>> # Load data from a file, then decompose>>> s=hs.load("my_file.hspy")>>> s.decomposition()
Note
The signal s must be multi-dimensional, i.e.s.axes_manager.navigation_size>1
One of the most popular uses of decomposition()
is data denoising. This is achieved by using a limited set of components
to make a model of the original dataset, omitting the less significant components that
ideally contain only noise.
To reconstruct your denoised or reduced model, run the
get_decomposition_model() method. For example:
>>> # Use all components to reconstruct the model>>> sc=s.get_decomposition_model()>>> # Use first 3 components to reconstruct the model>>> sc=s.get_decomposition_model(3)>>> # Use components [0, 2] to reconstruct the model>>> sc=s.get_decomposition_model([0,2])
Sometimes, it is useful to examine the residuals between your original data and
the decomposition model. You can easily calculate and display the residuals,
since get_decomposition_model() returns a new
object, which in the example above we have called sc:
>>> (s-sc).plot()
You can perform operations on this new object sc later.
It is a copy of the original s object, except that the data has
been replaced by the model constructed using the chosen components.
If you provide the output_dimension argument, which takes an integer value,
the decomposition algorithm attempts to find the best approximation for the
dataset \(X\) with only a limited set of factors \(A\) and loadings \(B\),
such that \(X \approx A B^T\).
>>> s.decomposition(output_dimension=3)
Some of the algorithms described below require output_dimension to be provided.
HyperSpy implements a number of decomposition algorithms via the algorithm argument.
The table below lists the algorithms that are currently available, and includes
links to the appropriate documentation for more information on each one.
Note
Choosing which algorithm to use is likely to depend heavily on the nature of your
dataset and the type of analysis you are trying to perform. We discuss some of the
reasons for choosing one algorithm over another below, but would encourage you to
do your own research as well. The scikit-learn documentation is a
very good starting point.
The default algorithm in HyperSpy is "SVD", which uses an approach called
“singular value decomposition” to decompose the data in the form
\(X = U \Sigma V^T\). The factors are given by \(U \Sigma\), and the
loadings are given by \(V^T\). For more information, please read the method
documentation for svd_pca().
In some fields, including electron microscopy, this approach of applying an SVD
directly to the data \(X\) is often called PCA (see below).
However, in the classical definition of PCA, the SVD should be applied to data that has
first been “centered” by subtracting the mean, i.e. \(\mathrm{SVD}(X - \bar X)\).
The "SVD" algorithm in HyperSpy does not apply this
centering step by default. As a result, you may observe differences between
the output of the "SVD" algorithm and, for example,
sklearn.decomposition.PCA, which does apply centering.
One of the most popular decomposition methods is principal component analysis (PCA).
To perform PCA on your dataset, run the decomposition()
method with any of following arguments.
You can also turn on centering with the default "SVD" algorithm via
the "centre" argument:
# Subtract the mean along the navigation axis>>>s.decomposition(algorithm="SVD",centre="navigation")Decompositioninfo:normalize_poissonian_noise=Falsealgorithm=SVDoutput_dimension=Nonecentre=navigation# Subtract the mean along the signal axis>>>s.decomposition(algorithm="SVD",centre="signal")Decompositioninfo:normalize_poissonian_noise=Falsealgorithm=SVDoutput_dimension=Nonecentre=signal
Most of the standard decomposition algorithms assume that the noise of the data
follows a Gaussian distribution (also known as “homoskedastic noise”).
In cases where your data is instead corrupted by Poisson noise, HyperSpy
can “normalize” the data by performing a scaling operation, which can greatly
enhance the result. More details about the normalization procedure can be
found in [Keenan2004].
To apply Poissonian noise normalization to your data:
>>> s.decomposition(normalize_poissonian_noise=True)>>> # Because it is the first argument we could have simply written:>>> s.decomposition(True)
Warning
Poisson noise normalization cannot be used in combination with data
centering using the 'centre' argument. Attempting to do so will
raise an error.
Maximum likelihood principal component analysis (MLPCA)#
Instead of applying Poisson noise normalization to your data, you can instead
use an approach known as Maximum Likelihood PCA (MLPCA), which provides a more
robust statistical treatment of non-Gaussian “heteroskedastic noise”.
>>> s.decomposition(algorithm="MLPCA")
For more information, please read the method documentation for mlpca().
Note
You must set the output_dimension when using MLPCA.
PCA is known to be very sensitive to the presence of outliers in data. These
outliers can be the result of missing or dead pixels, X-ray spikes, or very
low count data. If one assumes a dataset, \(X\), to consist of a low-rank
component \(L\) corrupted by a sparse error component \(S\), such that
\(X=L+S\), then Robust PCA (RPCA) can be used to recover the low-rank
component for subsequent processing [Candes2011].
Schematic diagram of the robust PCA problem, which combines a low-rank matrix
with sparse errors. Robust PCA aims to decompose the matrix back into these two
components.#
Note
You must set the output_dimension when using Robust PCA.
The default RPCA algorithm is GoDec [Zhou2011]. In HyperSpy
it returns the factors and loadings of \(L\). RPCA solvers work by using
regularization, in a similar manner to lasso or ridge regression, to enforce
the low-rank constraint on the data. The low-rank regularization parameter,
lambda1, defaults to 1/sqrt(n_features), but it is strongly recommended
that you explore the behaviour of different values.
HyperSpy also implements an online algorithm for RPCA developed by Feng et
al. [Feng2013]. This minimizes memory usage, making it
suitable for large datasets, and can often be faster than the default
algorithm.
The online RPCA implementation sets several default parameters that are
usually suitable for most datasets, including the regularization parameter
highlighted above. Again, it is strongly recommended that you explore the
behaviour of these parameters. To further improve the convergence, you can
“train” the algorithm with the first few samples of your dataset. For example,
the following code will train ORPCA using the first 32 samples of the data.
Finally, online RPCA includes two alternatives methods to the default
block-coordinate descent solver, which can again improve both the convergence
and speed of the algorithm. These are particularly useful for very large datasets.
The methods are based on stochastic gradient descent (SGD), and take an
additional parameter to set the learning rate. The learning rate dictates
the size of the steps taken by the gradient descent algorithm, and setting
it too large can lead to oscillations that prevent the algorithm from
finding the correct minima. Usually a value between 1 and 2 works well:
You can also use Momentum Stochastic Gradient Descent (MomentumSGD),
which typically improves the convergence properties of stochastic gradient
descent. This takes the further parameter subspace_momentum, which should
be a fraction between 0 and 1.
Using the "SGD" or "MomentumSGD" methods enables the subspace,
i.e. the underlying low-rank component, to be tracked as it changes
with each sample update. The default method instead assumes a fixed,
static subspace.
Another popular decomposition method is non-negative matrix factorization
(NMF), which can be accessed in HyperSpy with:
>>> s.decomposition(algorithm="NMF")
Unlike PCA, NMF forces the components to be strictly non-negative, which can
aid the physical interpretation of components for count data such as images,
EELS or EDS. For an example of NMF in EELS processing, see
[Nicoletti2013].
NMF takes the optional argument output_dimension, which determines the number
of components to keep. Setting this to a small number is recommended to keep
the computation time small. Often it is useful to run a PCA decomposition first
and use the scree plot to determine a suitable value
for output_dimension.
In a similar manner to the online, robust methods that complement PCA
above, HyperSpy includes an online robust NMF method.
This is based on the OPGD (Online Proximal Gradient Descent) algorithm
of [Zhao2016].
Note
You must set the output_dimension when using Robust NMF.
As before, you can control the regularization applied via the parameter “lambda1”:
Both the default and MomentumSGD solvers assume an l2-norm minimization problem,
which can still be sensitive to very heavily corrupted data. A more robust
alternative is available, although it is typically much slower.
In some cases it is possible to obtain more physically interpretable set of
components using a process called Blind Source Separation (BSS). This largely
depends on the particular application. For more information about blind source
separation please see [Hyvarinen2000], and for an
example application to EELS analysis, see [Pena2010].
Warning
The BSS algorithms operate on the result of a previous
decomposition analysis. It is therefore necessary to perform a
decomposition first before calling
blind_source_separation(), otherwise it
will raise an error.
You must provide an integer number_of_components argument,
or a list of components as the comp_list argument. This performs
BSS on the chosen number/list of components from the previous
decomposition.
To perform blind source separation on the result of a previous decomposition,
run the blind_source_separation() method, for example:
>>> s=hs.signals.Signal1D(np.random.randn(10,10,200))>>> s.decomposition(output_dimension=3) Decomposition info: normalize_poissonian_noise=False algorithm=SVD output_dimension=3 centre=None>>> s.blind_source_separation(number_of_components=3) Blind source separation info: number_of_components=3 algorithm=sklearn_fastica diff_order=1 reverse_component_criterion=factors whiten_method=PCA scikit-learn estimator: FastICA(tol=1e-10, whiten=False)# Perform only on the first and third components>>> s.blind_source_separation(comp_list=[0,2]) Blind source separation info: number_of_components=2 algorithm=sklearn_fastica diff_order=1 reverse_component_criterion=factors whiten_method=PCA scikit-learn estimator: FastICA(tol=1e-10, whiten=False)
HyperSpy implements a number of BSS algorithms via the algorithm argument.
The table below lists the algorithms that are currently available, and includes
links to the appropriate documentation for more information on each one.
Available blind source separation algorithms in HyperSpy#
An object implementing fit() and transform() methods
Note
Except orthomax(), all of the implemented BSS algorithms listed above
rely on external packages being available on your system. sklearn_fastica, requires
scikit-learn while FastICA,JADE,CuBICA,TDSEP
require the Modular toolkit for Data Processing (MDP).
Orthomax rotations are a statistical technique used to clarify and highlight the relationship among factors,
by adjusting the coordinates of PCA results. The most common approach is known as
“varimax”, which intended to maximize the variance shared
among the components while preserving orthogonality. The results of an orthomax rotation following PCA are
often “simpler” to interpret than just PCA, since each componenthas a more discrete contribution to the data.
One of the most common approaches for blind source separation is
Independent Component Analysis (ICA).
This separates a signal into subcomponents by assuming that the subcomponents are (a) non-Gaussian,
and (b) that they are statistically independent from each other.
Cluster analysis or clustering
is the task of grouping a set of measurements such that measurements in the same
group (called a cluster) are more similar (in some sense) to each other than to
those in other groups (clusters).
A HyperSpy signal can represent a number of large arrays of different measurements
which can represent spectra, images or sets of paramaters.
Identifying and extracting trends from large datasets is often difficult and
decomposition methods, blind source separation and cluster analysis play an important role in this process.
Cluster analysis, in essence, compares the “distances” (or similar metric)
between different sets of measurements and groups those that are closest together.
The features it groups can be raw data points, for example, comparing for
every navigation dimension all points of a spectrum. However, if the
dataset is large, the process of clustering can be computationally intensive so
clustering is more commonly used on an extracted set of features or parameters.
For example, extraction of two peak positions of interest via a fitting process
rather than clustering all spectra points.
In favourable cases, matrix decomposition and related methods can decompose the
data into a (ideally small) set of significant loadings and factors.
The factors capture a core representation of the features in the data and the loadings
provide the mixing ratios of these factors that best describe the original data.
Overall, this usually represents a much smaller data volume compared to the original data
and can helps to identify correlations.
A detailed description of the application of cluster analysis in x-ray
spectro-microscopy and further details on the theory and implementation can
be found in [Lerotic2004].
Taking the example of a 1D Signal of dimensions (20,10|4) containing the
dataset, we say there are 200 samples. The four measured parameters are the
features. If we choose to search for 3 clusters within this dataset, we
derive three main values:
The labels, of dimensions (3|20,10). Each navigation position is
assigned to a cluster. The labels of each cluster are boolean arrays
that mark the data that has been assigned to the cluster with True.
The cluster_distances, of dimensions (3|20,10), which are the
distances of all the data points to the centroid of each cluster.
The “cluster signals”, which are signals that are representative of
their clusters. In HyperSpy two are computer:
cluster_sum_signals and cluster_centroid_signals,
of dimensions (3|4), which are the sum of all the cluster signals
that belong to each cluster or the signal closest to each cluster
centroid respectively.
The cluster_analysis() method can perform cluster
analysis using any scikit-learn clustering
algorithms or any other object with a compatible API. This involves importing
the relevant algorithm class from scikit-learn.
For convenience, the default algorithm is the kmeans algorithm and is imported
internally. All extra keyword arguments are passed to the algorithm when
present. Therefore the following code is equivalent to the previous one:
cluster_analysis() computes the cluster labels. The
clusters areas with identical label are averaged to create a set of cluster
centres. This averaging can be performed on the signal itself, the
BSS or decomposition results
or a user supplied signal.
Cluster analysis measures the distances between features and groups them. It
is often necessary to pre-process the features in order to obtain meaningful
results.
For example, pre-processing can be useful to reveal clusters when
performing cluster analysis of decomposition results. Decomposition methods
decompose data into a set of factors and a set of loadings defining the
mixing needed to represent the data. If signal 1 is reduced to three
components with mixing 0.1 0.5 2.0, and signal 2 is reduced to a mixing of 0.2
1.0 4.0, it should be clear that these represent the same signal but with a
scaling difference. Normalization of the data can again be used to remove
scaling effects.
Therefore, the pre-processing step
will highly influence the results and should be evaluated for the problem
under investigation.
All pre-processing methods from (or compatible with) the
scikit-learn pre-processing module can be passed
to the scaling keyword of the cluster_analysis()
method. For convenience, the following methods from scikit-learn are
available as standard: standard , minmax and norm as
standard. Briefly, norm treats the features as a vector and normalizes the
vector length. standard re-scales each feature by removing the mean and
scaling to unit variance. minmax normalizes each feature between the
minimum and maximum range of that feature.
In HyperSpy cluster signals are signals that somehow represent their clusters.
The concept is ill-defined, since cluster algorithms only assign data points to
clusters. HyperSpy computes 2 cluster signals,
cluster_sum_signals, which are the sum of all the cluster signals
that belong to each cluster.
cluster_centroid_signals, which is the signal closest to each cluster
centroid.
When plotting the “cluster signals” we can select any of those
above using the signal keyword argument:
>>> s.plot_cluster_labels(signal="centroid")
In addition, it is possible to plot the mean signal over the different
clusters:
User developed preprocessing or cluster algorithms can be
used in place of the sklearn methods.
A preprocessing object needs a fit_transform which
appropriately scales the data.
The example below defines a preprocessing class which normalizes
the data then applies a square root to enhances weaker features.
For user defined clustering algorithms the class must implementation
fit and have a label_ attribute that contains the clustering labels.
An example template would be:
To see how cluster analysis works it’s best to first examine the signal.
Moving around the image you should be able to see 3 distinct regions in which
the 1D signal modulates slightly.
>>> s.plot()
Let’s perform SVD to reduce the dimensionality of the dataset by exploiting
redundancies:
In the SVD loading we can identify 3 regions, but they are mixed in the components.
Let’s perform cluster analysis of decomposition results, to find similar regions
and the representative features in those regions. Notice that this dataset does
not require any pre-processing for cluster analysis.
In this case we know there are 3 cluster, but for real examples the number of
clusters is not known a priori. A number of metrics, such as elbow,
Silhouette and Gap can be used to estimate the optimal number of clusters.
The elbow method measures the sum-of-squares of the distances within a
cluster and, as for the PCA decomposition, an “elbow” or point where the gains
diminish with increasing number of clusters indicates the ideal number of
clusters. Silhouette analysis measures how well separated clusters are and
can be used to determine the most likely number of clusters. As the scoring
is a measure of separation of clusters a number of solutions may occur and
maxima in the scores are used to indicate possible solutions. Gap analysis
is similar but compares the “gap” between the clustered data results and
those from a randomly data set of the same size. The largest gap indicates
the best clustering. The metric results can be plotted to check how
well-defined the clustering is.
>>> s.estimate_number_of_clusters(cluster_source="decomposition",metric="gap")3>>> s.plot_cluster_metric()<Axes: xlabel='number of clusters', ylabel='gap_metric'>
The optimal number of clusters can be set or accessed from the learning
results
In this example we will perform clustering analysis on the position of two
peaks. The signals containing the position of the peaks can be computed for
example using curve fitting. Given an existing fitted
model, the parameters can be extracted as signals and stacked. Clustering can
then be applied as described previously to identify trends in the fitted
results.
Let’s start by creating a suitable synthetic dataset.
Let’s now perform cluster analysis on the stack and calculate the centres using
the spectrum image. Notice that we don’t need to fit the model to the data
because this is a synthetic dataset. When analysing experimental data you will
need to fit the model first. Also notice that here we need to pre-process the
dataset by normalization in order to reveal the clusters due to the
proportionality relationship between the position of the peaks.
Notice that in this case averaging or summing the signals of
each cluster is not appropriate, since the clustering criterium
is the ratio between the peaks positions. A better alternative
is to plot the signals closest to the centroids:
HyperSpy includes a number of plotting methods for visualizing the results
of decomposition and blind source separation analyses. All the methods
begin with plot_.
Scree plots are only available for the "SVD" and "PCA" algorithms.
PCA will sort the components in the dataset in order of decreasing
variance. It is often useful to estimate the dimensionality of the data by
plotting the explained variance against the component index. This plot is
sometimes called a scree plot. For most datasets, the values in a scree plot
will decay rapidly, eventually becoming a slowly descending line.
The point at which the scree plot becomes linear (often referred to as
the “elbow”) is generally judged to be a good estimation of the dimensionality
of the data (or equivalently, the number of components that should be retained
- see below). Components to the left of the elbow are considered part of the “signal”,
while components to the right are considered to be “noise”, and thus do not explain
any significant features of the data.
By specifying a threshold value, a cutoff line will be drawn at the total variance
specified, and the components above this value will be styled distinctly from the
remaining components to show which are considered signal, as opposed to noise.
Alternatively, by providing an integer value for threshold, the line will
be drawn at the specified component (see below).
Note that in the above scree plot, the first component has index 0. This is because
Python uses zero-based indexing. To switch to a “number-based” (rather than
“index-based”) notation, specify the xaxis_type parameter:
PCA scree plot with number-based axis labeling and a threshold value
specified#
The number of significant components can be estimated and a vertical line
drawn to represent this by specifying vline=True. In this case, the “elbow”
is found in the variance plot by estimating the distance from each point in the
variance plot to a line joining the first and last points of the plot, and then
selecting the point where this distance is largest.
If multiple maxima are found, the index corresponding to the first occurrence
is returned. As the index of the first component is zero, the number of
significant PCA components is the elbow index position + 1. More details
about the elbow-finding technique can be found in
[Satopää2011], and in the documentation for
estimate_elbow_position().
PCA scree plot with number-based axis labeling and an estimate of the no of significant
positions based on the “elbow” position#
These options (together with many others), can be customized to
develop a figure of your liking. See the documentation of
plot_explained_variance_ratio() for more details.
Sometimes it can be useful to get the explained variance ratio as a spectrum.
For example, to plot several scree plots obtained with
different data pre-treatments in the same figure, you can combine
plot_spectra() with
get_explained_variance_ratio().
HyperSpy provides a number of methods for visualizing the factors and loadings
found by a decomposition analysis. To plot everything in a compact form,
use plot_decomposition_results().
You can also plot the factors and loadings separately using the following
methods. It is recommended that you provide the number of factors or loadings
you wish to visualise, since the default is to plot all of them.
The decomposition and BSS results are internally stored as numpy arrays in the
BaseSignal class. Frequently it is useful to obtain the
decomposition/BSS factors and loadings as HyperSpy signals, and HyperSpy
provides the following methods for that purpose:
If you save the dataset on which you’ve performed machine learning analysis in
the HSpy-HDF5 format (the default in HyperSpy, see
Saving), the result of the analysis is also saved in the same
file automatically, and it is loaded along with the rest of the data when you
next open the file.
Note
This approach currently supports storing one decomposition and one BSS
result, which may not be enough for your purposes.
Alternatively, you can save the results of the current machine learning
analysis to a separate file with the
save() method:
>>> # Save the result of the analysis>>> s.learning_results.save('my_results.npz')>>> # Load back the results>>> s.learning_results.load('my_results.npz')
These methods accept many arguments to customise the way in which the
data is exported, so please consult the method documentation. The options
include the choice of file format, the prefixes for loadings and factors,
saving figures instead of data and more.
Warning
Data exported in this way cannot be easily loaded into HyperSpy’s
machine learning structure.
HyperSpy can perform curve fitting of one-dimensional signals (spectra) and
two-dimensional signals (images) in n-dimensional data sets.
Models are defined by adding individual functions (components in HyperSpy’s
terminology) to a BaseModel instance. Those individual
components are then summed to create the final model function that can be
fitted to the data, by adjusting the free parameters of the individual
components.
Models can be created and fit to experimental data in both one and two
dimensions i.e. spectra and images respectively. Most of the syntax is
identical in either case. A one-dimensional model is created when a model
is created for a Signal1D whereas a two-
dimensional model is created for a Signal2D.
Note
Plotting and analytical gradient-based fitting methods are not yet
implemented for the Model2D class.
>>> im=hs.signals.Signal2D(np.arange(300).reshape(3,10,10))>>> mod=im.create_model()# Create the 2D-Model and assign it to mod
The syntax for creating both one-dimensional and two-dimensional models is thus
identical for the user in practice. When a model is created you may be
prompted to provide important information not already included in the
datafile, e.g. if s is EELS data, you may be asked for the accelerating
voltage, convergence and collection semi-angles etc.
Note
Before creating a model verify that the
is_binned attribute
of the signal axis is set to the correct value because the resulting
model depends on this parameter. See Binned and unbinned signals for more details.
When importing data that has been binned using other software, in
particular Gatan’s DM, the stored values may be the averages of the
binned channels or pixels, instead of their sum, as would be required
for proper statistical analysis. We therefore cannot guarantee that
the statistics will be valid, and so strongly recommend that all
pre-fitting binning is performed using Hyperspy.
In HyperSpy a model consists of a sum of individual components. For convenience,
HyperSpy provides a number of pre-defined model components as well as mechanisms
to create your own components.
However, this doesn’t mean that you have to limit yourself to this meagre
list of functions. As discussed below, it is very easy to turn a
mathematical, fixed-pattern or Python function into a component.
The easiest way to turn a mathematical expression into a component is using the
Expression component. For example, the
following is all you need to create a
Gaussian component with more sensible
parameters for spectroscopy than the one that ships with HyperSpy:
If the expression is inconvenient to write out in full (e.g. it’s long and/or
complicated), multiple substitutions can be given, separated by semicolons.
Both symbolic and numerical substitutions are allowed:
>>> expression="h / sqrt(p2) ; p2 = 2 * m0 * e1 * x * brackets;">>> expression+="brackets = 1 + (e1 * x) / (2 * m0 * c * c) ;">>> expression+="m0 = 9.1e-31 ; c = 3e8; e1 = 1.6e-19 ; h = 6.6e-34">>> wavelength=hs.model.components1D.Expression(... expression=expression,... name="Electron wavelength with voltage")
Expression uses Sympy internally to turn the string into
a function. By default it “translates” the expression using
numpy, but often it is possible to boost performance by using
numexpr instead.
It can also create 2D components with optional rotation. In the following
example we create a 2D Gaussian that rotates around its center:
Of course Expression is only useful for
analytical functions. You can define more general components modifying the
following template to suit your needs:
fromhyperspy.componentimportComponentclassMyComponent(Component):""" """def__init__(self,parameter_1=1,parameter_2=2):# Define the parametersComponent.__init__(self,('parameter_1','parameter_2'))# Optionally we can set the initial valuesself.parameter_1.value=parameter_1self.parameter_2.value=parameter_2# The units (optional)self.parameter_1.units='Tesla'self.parameter_2.units='Kociak'# Once defined we can give default values to the attribute# For example we fix the attribure_1 (optional)self.parameter_1.attribute_1.free=False# And we set the boundaries (optional)self.parameter_1.bmin=0.self.parameter_1.bmax=None# Optionally, to boost the optimization speed we can also define# the gradients of the function we the syntax:# self.parameter.grad = functionself.parameter_1.grad=self.grad_parameter_1self.parameter_2.grad=self.grad_parameter_2# Define the function as a function of the already defined parameters,# x being the independent variable valuedeffunction(self,x):p1=self.parameter_1.valuep2=self.parameter_2.valuereturnp1+x*p2# Optionally define the gradients of each parameterdefgrad_parameter_1(self,x):""" Returns d(function)/d(parameter_1) """return0defgrad_parameter_2(self,x):""" Returns d(function)/d(parameter_2) """returnx
The ScalableFixedPattern
component enables fitting a pattern (in the form of a
Signal1D instance) to data by shifting
(shift)
and
scaling it in the x and y directions using the
xscale
and
yscale
parameters respectively.
To print the current components in a model use
components. A table with component number,
attribute name, component name and component type will be printed:
Sometimes components may be created automatically. For example, if
the Signal1D is recognised as EELS data, a
power-law background component may automatically be added to the model.
Therefore, the table above may not all may empty on model creation.
To add a component to the model, first we have to create an instance of the
component.
Once the instance has been created we can add the component to the model
using the append() and
extend() methods for one or more components
respectively.
As an example, let’s add several Gaussian
components to the model:
>>> gaussian=hs.model.components1D.Gaussian()# Create a Gaussian comp.>>> m.append(gaussian)# Add it to the model>>> m.components# Print the model components # | Attribute Name | Component Name | Component Type---- | ------------------- | ------------------- | ------------------- 0 | Gaussian | Gaussian | Gaussian>>> gaussian2=hs.model.components1D.Gaussian()# Create another gaussian>>> gaussian3=hs.model.components1D.Gaussian()# Create a third gaussian
We could use the append() method twice to add the
two Gaussians, but when adding multiple components it is handier to use the
extend method that enables adding a list of components at once.
Notice that two components cannot have the same name:
>>> gaussian2.name='Carbon'Traceback (most recent call last):
File "<ipython-input-5-2b5669fae54a>", line 1, in <module>g2.name="Carbon" File "/home/fjd29/Python/hyperspy/hyperspy/component.py", line 466, in name "the name " + str(value))ValueError: Another component already has the name Carbon
It is possible to access the components in the model by their name or by the
index in the model.
In addition, the components can be accessed in the
components Model attribute. This is specially
useful when working in interactive data analysis with IPython because it
enables tab completion.
>>> m.components # | Attribute Name | Component Name | Component Type---- | ------------------- | ------------------- | ------------------- 0 | Carbon | Carbon | Gaussian 1 | Long_Hydrogen_name | Long Hydrogen name | Gaussian 2 | Nitrogen | Nitrogen | Gaussian>>> m.components.Long_Hydrogen_name<Long Hydrogen name (Gaussian component)>
It is possible to “switch off” a component by setting its
active attribute to False. When a component is
switched off, to all effects it is as if it was not part of the model. To
switch it back on simply set the active attribute back to True.
In multi-dimensional signals it is possible to store the value of the
active attribute at each navigation index.
To enable this feature for a given component set the
active_is_multidimensional attribute to
True.
Often it is useful to consider only part of the model - for example at
a particular location (i.e. a slice in the navigation space) or energy range
(i.e. a slice in the signal space). This can be done using exactly the same
syntax that we use for signal indexing.
red_chisq and dof
are automatically recomputed for the resulting slices.
>>> s=hs.signals.Signal1D(np.arange(100).reshape(10,10))>>> m=s.create_model()>>> m.append(hs.model.components1D.Gaussian())>>> # select first three navigation pixels and last five signal channels>>> m1=m.inav[:3].isig[-5:]>>> m1.signal<Signal1D, title: , dimensions: (3|5)>
Getting and setting parameter values and attributes#
print_current_values() prints the properties of the
parameters of the components in the current coordinates. In the Jupyter Notebook,
the default view is HTML-formatted, which allows for formatted copying
into other software, such as Excel. One can also filter for only active
components and only showing component with free parameters with the arguments
only_active and only_free, respectively.
The current values of a particular component can be printed using the
print_current_values() method.
The current coordinates can be either set by navigating the
plot(), or specified by pixel indices in
m.axes_manager.indices or as calibrated coordinates in
m.axes_manager.coordinates.
parameters contains a list of the parameters
of a component and free_parameters lists only
the free parameters.
The value of a particular parameter in the current coordinates can be
accessed by component.Parameter.value (e.g. Gaussian.A.value).
To access an array of the value of the parameter across all navigation
pixels, component.Parameter.map['values'] (e.g.
Gaussian.A.map["values"]) can be used. On its own,
component.Parameter.map returns a NumPy array with three elements:
'values', 'std' and 'is_set'. The first two give the value and
standard error for each index. The last element shows whether the value has
been set in a given index, either by a fitting procedure or manually.
If a model contains several components with the same parameters, it is possible
to change them all by using set_parameters_value():
To set the free state of a parameter change the
free attribute. To change the free state
of all parameters in a component to True use
set_parameters_free(), and
set_parameters_not_free() for setting them to
False. Specific parameter-names can also be specified by using
parameter_name_list, shown in the example:
>>> g=hs.model.components1D.Gaussian()>>> g.free_parameters(<Parameter A of Gaussian component>, <Parameter centre of Gaussian component>, <Parameter sigma of Gaussian component>)>>> g.set_parameters_not_free()>>> g.set_parameters_free(parameter_name_list=['A','centre'])>>> g.free_parameters(<Parameter A of Gaussian component>, <Parameter centre of Gaussian component>)
Similar functions exist for BaseModel:
set_parameters_free() and
set_parameters_not_free(). Which sets the
free states for the parameters in components in a model. Specific
components and parameter-names can also be specified. For example:
>>> g1=hs.model.components1D.Gaussian()>>> g2=hs.model.components1D.Gaussian()>>> m.extend([g1,g2])>>> m.set_parameters_not_free()>>> g1.free_parameters()>>> g2.free_parameters()>>> m.set_parameters_free(parameter_name_list=['A'])>>> g1.free_parameters(<Parameter A of Gaussian_1 component>,)>>> g2.free_parameters(<Parameter A of Gaussian_2 component>,)>>> m.set_parameters_free([g1],parameter_name_list=['sigma'])>>> g1.free_parameters(<Parameter A of Gaussian_1 component>, <Parameter sigma of Gaussian_1 component>)>>> g2.free_parameters(<Parameter A of Gaussian_2 component>,)
By default the coupling function is the identity function. However it is
possible to set a different coupling function by setting the
twin_function_expr and
twin_inverse_function_expr attributes. For
example:
The following model methods can be used to ease the task of setting some important
parameter attributes. These can also be used on a per-component basis, by calling them
on individual components.
To fit the model to the data at the current coordinates (e.g. to fit one
spectrum at a particular point in a spectrum-image), use
fit(). HyperSpy implements a number of
different optimization approaches, each of which can have particular
benefits and/or drawbacks depending on your specific application.
A good approach to choosing an optimization approach is to ask yourself
the question “Do you want to…”:
Apply bounds to your model parameter values?
Use gradient-based fitting algorithms to accelerate your fit?
Estimate the standard deviations of the parameter values found by the fit?
Fit your data in the least-squares sense, or use another loss function?
Find the global optima for your parameters, or is a local optima acceptable?
The following table summarizes the features of some of the optimizers
currently available in HyperSpy, including whether they support parameter
bounds, gradients and parameter error estimation. The “Type” column indicates
whether the optimizers find a local or global optima.
The default optimizer in HyperSpy is "lm", which stands for the Levenberg-Marquardt
algorithm. In
earlier versions of HyperSpy (< 1.6) this was known as "leastsq".
HyperSpy supports a number of loss functions. The default is "ls",
i.e. the least-squares loss. For the vast majority of cases, this loss
function is appropriate, and has the additional benefit of supporting
parameter error estimation and goodness-of-fit
testing. However, if your data contains very low counts per pixel, or
is corrupted by outliers, the "ML-poisson" and "huber" loss
functions may be worth investigating.
The following example shows how to perfom least squares optimization with
error estimation. First we create data consisting of a line
y=a*x+b with a=1 and b=100, and we then add Gaussian
noise to it:
When the noise is heteroscedastic, only if the
metadata.Signal.Noise_properties.variance attribute of the
Signal1D instance is defined can
the parameter standard deviations be estimated accurately.
If the variance is not defined, the standard deviations are still
computed, by setting variance equal to 1. However, this calculation
will not be correct unless an accurate value of the variance is
provided. See Setting the noise properties for more information.
Because the noise is heteroscedastic, the least squares optimizer estimation is
biased. A more accurate result can be obtained with weighted least squares,
where the weights are proportional to the inverse of the noise variance.
Although this is still biased for Poisson noise, it is a good approximation
in most cases where there are a sufficient number of counts per pixel.
When the attribute metadata.Signal.Noise_properties.variance
is defined, the behaviour is to perform a weighted least-squares
fit using the inverse of the noise variance as the weights.
In this scenario, to then disable weighting, you will need to unset
the attribute. You can achieve this with
set_noise_variance():
>>> m.signal.set_noise_variance(None)# This will now be an unweighted fit>>> m.fit()>>> line.a0.value-1.9711403542163477>>> line.a1.value1.0258716193502546
To avoid biased estimation in the case of data corrupted by Poisson noise
with very few counts, we can use Poisson maximum likelihood estimation (MLE) instead.
This is an unbiased estimator for Poisson noise. To perform MLE, we must
use a general, non-linear optimizer from the table above,
such as Nelder-Mead or L-BFGS-B:
HyperSpy also implements the
Huber loss function,
which is typically less sensitive to outliers in the data compared
to the least-squares loss. Again, we need to use one of the general
non-linear optimization algorithms:
As well as the built-in loss functions described above,
a custom loss function can be passed to the model:
>>> defmy_custom_function(model,values,data,weights=None):... """... Parameters... ----------... model : Model instance... the model that is fitted.... values : np.ndarray... A one-dimensional array with free parameter values suggested by the... optimizer (that are not yet stored in the model).... data : np.ndarray... A one-dimensional array with current data that is being fitted.... weights : {np.ndarray, None}... An optional one-dimensional array with parameter weights....... Returns... -------... score : float... A signle float value, representing a score of the fit, with... lower values corresponding to better fits.... """... # Almost any operation can be performed, for example:... # First we store the suggested values in the model... model.fetch_values_from_array(values)...... # Evaluate the current model... cur_value=model(onlyactive=True)...... # Calculate the weighted difference with data... ifweightsisNone:... weights=1... difference=(data-cur_value)*weights...... # Return squared and summed weighted difference... return(difference**2).sum()>>> # We must use a general non-linear optimizer>>> m.fit(optimizer='Nelder-Mead',loss_function=my_custom_function)
If the optimizer requires an analytical gradient function, it can be similarly
passed, using the following signature:
>>> defmy_custom_gradient_function(model,values,data,weights=None):... """... Parameters... ----------... model : Model instance... the model that is fitted.... values : np.ndarray... A one-dimensional array with free parameter values suggested by the... optimizer (that are not yet stored in the model).... data : np.ndarray... A one-dimensional array with current data that is being fitted.... weights : {np.ndarray, None}... An optional one-dimensional array with parameter weights....... Returns... -------... gradients : np.ndarray... a one-dimensional array of gradients, the size of `values`,... containing each parameter gradient with the given values... """... # As an example, estimate maximum likelihood gradient:... model.fetch_values_from_array(values)... cur_value=model(onlyactive=True)...... # We use in-built jacobian estimation... jac=model._jacobian(values,data)...... return-(jac*(data/cur_value-1)).sum(1)>>> # We must use a general non-linear optimizer again>>> m.fit(optimizer='L-BFGS-B',... loss_function=my_custom_function,... grad=my_custom_gradient_function)
Added in version 1.6: grad="analytical" and grad="fd" keyword arguments
Optimization algorithms that take into account the gradient of
the loss function will often out-perform so-called “derivative-free”
optimization algorithms in terms of how rapidly they converge to a
solution. HyperSpy can use analytical gradients for model-fitting,
as well as numerical estimates of the gradient based on finite differences.
If all the components in the model support analytical gradients,
you can pass grad="analytical" in order to use this information
when fitting. The results are typically more accurate than an
estimated gradient, and the optimization often runs faster since
fewer function evaluations are required to calculate the gradient.
Following the above examples:
>>> m=s.create_model()>>> line=hs.model.components1D.Polynomial(order=1)>>> m.append(line)>>> # Use a 2-point finite-difference scheme to estimate the gradient>>> m.fit(grad="fd",fd_scheme="2-point")>>> # Use the analytical gradient>>> m.fit(grad="analytical")>>> # Huber loss and Poisson MLE functions>>> # also support analytical gradients>>> m.fit(grad="analytical",loss_function="huber")>>> m.fit(grad="analytical",loss_function="ML-poisson")
Note
Analytical gradients are not yet implemented for the
Model2D class.
Non-linear optimization can sometimes fail to converge to a good optimum,
especially if poor starting values are provided. Problems of ill-conditioning
and non-convergence can be improved by using bounded optimization.
All components’ parameters have the attributes parameter.bmin and
parameter.bmax (“bounded min” and “bounded max”). When fitting using the
bounded=True argument by m.fit(bounded=True) or m.multifit(bounded=True),
these attributes set the minimum and maximum values allowed for parameter.value.
Currently, not all optimizers support bounds - see the
table above. In the following example, a Gaussian
histogram is fitted using a Gaussian
component using the Levenberg-Marquardt (“lm”) optimizer and bounds
on the centre parameter.
Linear fitting can be used to address some of the drawbacks of non-linear optimization:
it doesn’t suffer from the starting parameters issue, which can sometimes be problematic
with nonlinear fitting. Since linear fitting uses linear algebra to find the
solution (find the parameter values of the model), the solution is a unique solution,
while nonlinear optimization uses an iterative approach and therefore relies
on the initial values of the parameters.
it is fast, because i) in favorable situations, the signal can be fitted in a vectorized
fashion, i.e. the signal is fitted in a single run instead of iterating over
the navigation dimension; ii) it is not iterative, i.e. it does the
calculation only one time instead of 10-100 iterations, depending on how
quickly the non-linear optimizer will converge.
However, linear fitting can only fit linear models and will not be able to fit
parameters which vary non-linearly.
A component is considered linear when its free parameters scale the component only
in the y-axis. For the exemplary function y=a*x**b, a is a linear parameter, whilst b
is not. If b.free=False, then the component is linear.
Components can also be made up of several linear parts. For instance,
the 2D-polynomial y=a*x**2+b*y**2+c*x+d*y+e is entirely linear.
Note
After creating a model with values for the nonlinear parameters, a quick way to set
all nonlinear parameters to be free=False is to use m.set_parameters_not_free(only_nonlinear=True)
To check if a parameter is linear, use the model or component method
print_current_values(). For a component to be
considered linear, it can hold only one free parameter, and that parameter
must be linear.
If all components in a model are linear, then a linear optimizer can be used to
solve the problem as a linear regression problem! This can be done using two approaches:
the standard pixel-by-pixel approach as used by the nonlinear optimizers
fit the entire dataset in one vectorised operation, which will be much faster (up to 1000 times).
However, there is a caveat: all fixed parameters must have the same value across the dataset in
order to avoid creating a very large array whose size will scale with the number of different
values of the non-free parameters.
Note
A good example of a linear model in the electron-microscopy field is an Energy-Dispersive
X-ray Spectroscopy (EDS) dataset, which typically consists of a polynomial background and
Gaussian peaks with well-defined energy (Gaussian.centre) and peak widths
(Gaussian.sigma). This dataset can be fit extremely fast with a linear optimizer.
There are two implementations of linear least squares fitting in hyperspy:
the 'ridge_regression' optimizer, which supports regularization
(see sklearn.linear_model.Ridge for arguments to pass to
fit()), but does not support lazy signals.
In the following example, we first generate a 300x300 navigation signal of varying total intensity,
and then populate it with an EDS spectrum at each point. The signal can be fitted with a polynomial
background and a Gaussian for each peak. Hyperspy automatically adds these to the model, and fixes
the centre and sigma parameters to known values. Fitting this model with a non-linear optimizer
can about half an hour on a decent workstation. With a linear optimizer, it takes seconds.
Standard errors for the parameters are by default not calculated when the dataset
is fitted in vectorized fashion, because it has large memory requirement.
If errors are required, either pass calculate_errors=True as an argument
to multifit(), or rerun
multifit() with a nonlinear optimizer,
which should run fast since the parameters are already optimized.
None of the linear optimizers currently support bounds.
After fitting the model, details about the optimization
procedure, including whether it finished successfully,
are returned as scipy.optimize.OptimizeResult object,
according to the keyword argument return_info=True.
These details are often useful for diagnosing problems such
as a poorly-fitted model or a convergence failure.
You can also access the object as the fit_output attribute:
You can also print this information using the
print_info keyword argument:
# Print the info to stdout>>>m.fit(optimizer="L-BFGS-B",print_info=True)# doctest: +SKIPFitinfo:optimizer=L-BFGS-Bloss_function=lsbounded=Falsegrad="fd"Fitresult:hess_inv:<3x3LbfgsInvHessProductwithdtype=float64>message:b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'nfev:168nit:32njev:42status:0success:Truex:array([9.97614503e+03,-1.10610734e-01,1.98380701e+00])
The chi-squared, reduced chi-squared and the degrees of freedom are
computed automatically when fitting a (weighted) least-squares model
(i.e. only when loss_function="ls"). They are stored as signals, in the
chisq, red_chisq and
dof attributes of the model respectively.
Warning
Unless metadata.Signal.Noise_properties.variance contains
an accurate estimation of the variance of the data, the chi-squared and
reduced chi-squared will not be computed correctly. This is true for both
homocedastic and heteroscedastic noise.
By default only the full model line is displayed in the plot. In addition, it
is possible to display the individual components by calling
enable_plot_components() or directly using
plot():
>>> m.plot(plot_components=True)# Visualise the results
All extra keyword arguments are passed to the plot()
method of the corresponding signal object. The following example plots the
model signal figure but not its navigator:
>>> m.plot(navigator=False)
By default the model plot is automatically updated when any parameter value
changes. It is possible to suspend this feature with
suspend_update().
Non-linear optimization often requires setting sensible starting parameters.
This can be done by plotting the model and adjusting the parameters by hand.
Changed in version 1.3: All notebook_interaction methods renamed to gui().
The notebook_interaction methods were removed in 2.0.
If running in a Jupyter Notebook, interactive widgets can be used to
conveniently adjust the parameter values by running
gui() for BaseModel,
Component and
Parameter.
Interactive widgets for the full model in a Jupyter notebook. Drag the
sliders to adjust current parameter values. Typing different minimum and
maximum values changes the boundaries of the slider.#
The example below shows how a boolean array can be easily created from the
signal and how the isig syntax can be used to define the signal range.
>>> # Create a sample 2D gaussian dataset>>> g=hs.model.components2D.Gaussian2D(... A=1,centre_x=-5.0,centre_y=-5.0,sigma_x=1.0,sigma_y=2.0,)>>> scale=0.1>>> x=np.arange(-10,10,scale)>>> y=np.arange(-10,10,scale)>>> X,Y=np.meshgrid(x,y)>>> im=hs.signals.Signal2D(g.function(X,Y))>>> im.axes_manager[0].scale=scale>>> im.axes_manager[0].offset=-10>>> im.axes_manager[1].scale=scale>>> im.axes_manager[1].offset=-10>>> m=im.create_model()# Model initialisation>>> gt=hs.model.components2D.Gaussian2D()>>> m.append(gt)>>> m.set_signal_range(-7,-3,-9,-1)# Set signal range>>> m.fit()Alternatively, create a boolean signal of the same shapeas the signal space of im>>> signal_mask=im>0.01>>> m.set_signal_range_from_mask(signal_mask.data)# Set signal range>>> m.fit()
To fit the model to all the elements of a multidimensional dataset, use
multifit():
>>> m.multifit()# warning: this can be a lengthy process on large datasets
multifit() fits the model at the first position,
stores the result of the fit internally and move to the next position until
reaching the end of the dataset.
Note
Sometimes this method can fail, especially in the case of a TEM spectrum
image of a particle surrounded by vacuum (since in that case the
top-left pixel will typically be an empty signal).
To get sensible starting parameters, you can do a single
fit() after changing the active position
within the spectrum image (either using the plotting GUI or by directly
modifying s.axes_manager.indices as in Setting axis properties).
After doing this, you can initialize the model at every pixel to the
values from the single pixel fit using m.assign_current_values_to_all(),
and then use multifit() to perform the fit over
the entire spectrum image.
Added in version 1.6: New optional fitting iteration path “serpentine”
Added in version 2.0: New default iteration path for fitting is “serpentine”`
In HyperSpy, curve fitting on a multidimensional dataset happens in the following
manner: Pixels are fit along the row from the first index in the first row, and
once the last pixel in the row is reached, one proceeds in reverse order from the
last index in the second row. This procedure leads to a serpentine pattern, as
seen on the image below. The serpentine pattern supports n-dimensional
navigation space, so the first index in the second frame of a three-dimensional
navigation space will be at the last position of the previous frame.
An alternative scan pattern would be the 'flyback' scheme, where the map is
iterated through row by row, always starting from the first index. This pattern
can be explicitly set using the multifit()iterpath='flyback' argument. However, the 'serpentine' strategy is
usually more robust, as it always moves on to a neighbouring pixel and the fitting
procedure uses the fit result of the previous pixel as the starting point for the
next. A common problem in the 'flyback' pattern is that the fitting fails
going from the end of one row to the beginning of the next, as the spectrum can
change abruptly.
Comparing the scan patterns generated by the 'flyback' and 'serpentine'
iterpath options for a 2D navigation space. The pixel intensity and number
refers to the order that the signal is fitted in.#
In addition to 'serpentine' and 'flyback', iterpath can take as
argument any list or array of indices, or a generator of such, as explained in
the Iterating AxesManager section.
Sometimes one may like to store and fetch the value of the parameters at a
given position manually. This is possible using
store_current_values() and
fetch_stored_values().
Multiple models can be stored in the same signal. In particular, when
store() is called, a full “frozen” copy of the model
is stored in stored in the signal’s ModelManager,
which can be accessed in the models attribute (i.e. s.models)
The stored models can be recreated at any time by calling
restore() with the stored
model name as an argument. To remove a model from storage, simply call
remove().
The stored models can be either given a name, or assigned one automatically.
The automatic naming follows alphabetical scheme, with the sequence being (a,
b, …, z, aa, ab, …, az, ba, …).
Note
If you want to slice a model, you have to perform the operation on the
model itself, not its stored version
Warning
Modifying a signal in-place (e.g. map(),
crop(),
align1D(),
align2D() and similar)
will invalidate all stored models. This is done intentionally.
Current stored models can be listed by calling s.models:
To save a model, a convenience function save() is
provided, which stores the current model into its signal and saves the
signal. As described in Storing models, more than just one
model can be saved with one signal.
For older versions of HyperSpy (before 0.9), the instructions were as follows:
Note that this method is known to be brittle i.e. there is no
guarantee that a version of HyperSpy different from the one used to save
the model will be able to load it successfully. Also, it is
advisable not to use this method in combination with functions that
alter the value of the parameters interactively (e.g.
enable_adjust_position) as the modifications made by this functions
are normally not stored in the IPython notebook or Python script.
SAMFire (Smart Adaptive Multi-dimensional Fitting) is an algorithm created to
reduce the starting value (or local / false minima) problem, which often arises
when fitting multi-dimensional datasets.
The algorithm is described in Tomas Ostasevicius’ PhD thesis, entitled “Multi-dimensional Data Analysis in Electron Microscopy”.
During operation SAMFire uses a list of strategies to determine how to select
the next pixel and estimate its starting parameters. Only one strategy is used
at a time. Next strategy is chosen when no new pixels can be fitted with
the current strategy. Once either the strategy list is exhausted or the full
dataset fitted, the algorithm terminates.
There are two families of strategies. In each family there may be many
strategies, using different statistical or significance measures.
As a rule of thumb, the first strategy in the list should always be from the
local family, followed by a strategy from the global family.
These strategies assume that locally neighbouring pixels are similar. As a
result, the pixel fitting order seems to follow data-suggested order, and the
starting values are computed from the surrounding already fitted pixels.
More information about the exact procedure will be available once the
accompanying paper is published.
Global strategies assume that the navigation coordinates of each pixel bear no
relation to it’s signal (i.e. the location of pixels is meaningless). As a
result, the pixels are selected at random to ensure uniform sampling of the
navigation space.
A number of candidate starting values are computed form global statistical
measures. These values are all attempted in order until a satisfactory result
is found (not necessarily testing all available starting guesses). As a result,
on average each pixel requires significantly more computations when compared to
a local strategy.
More information about the exact procedure will be available once the
accompanying paper is published.
Due to the strategies using already fitted pixels to estimate the starting
values, at least one pixel has to be fitted beforehand by the user.
The seed pixel(s) should be selected to require the most complex model present
in the dataset, however in-built goodness of fit checks ensure that only
sufficiently well fitted values are allowed to propagate.
If the dataset consists of regions (in the navigation space) of highly
dissimilar pixels, often called “domain structures”, at least one seed pixel
should be given for each unique region.
If the starting pixels were not optimal, only part of the dataset will be
fitted. In such cases it is best to allow the algorithm terminate, then provide
new (better) seed pixels by hand, and restart SAMFire. It will use the
new seed together with the already computed parts of the data.
After creating a model and fitting suitable seed pixels, to fit the rest of
the multi-dimensional dataset using SAMFire we must create a SAMFire instance
as follows:
By default SAMFire will look for an ipyparallel cluster for the
workers for around 30 seconds. If none is available, it will use
multiprocessing instead. However, if you are not planning to use ipyparallel,
it’s recommended specify it explicitly via the ipyparallel=False argument,
to use the fall-back option of multiprocessing.
By default a new SAMFire object already has two (and currently only) strategies
added to its strategies list:
>>> samf.strategies A | # | Strategy -- | ---- | ------------------------- x | 0 | Reduced chi squared strategy | 1 | Histogram global strategy
The currently active strategy is marked by an ‘x’ in the first column.
If a new datapoint (i.e. pixel) is added manually, the “database” of the
currently active strategy has to be refreshed using the
refresh_database() call.
The current strategy “database” can be plotted using the
plot() method.
Whilst SAMFire is running, each pixel is checked by a goodness_test,
which is by default
red_chisq_test,
checking the reduced chi-squared to be in the bounds of [0, 2].
This tolerance can (and most likely should!) be changed appropriately for the
data as follows:
>>> # use a sensible value>>> samf.metadata.goodness_test.tolerance=0.3
The SAMFire managed multi-dimensional fit can be started using the
start() method. All keyword arguments are passed to
the underlying (i.e. usual) fit() call:
HyperSpy makes it possible to analyse data larger than the available memory by
providing “lazy” versions of most of its signals and functions. In most cases
the syntax remains the same. This chapter describes how to work with data
larger than memory using the LazySignal class and
its derivatives.
If the data is large and not loaded by HyperSpy (for example a hdf5.Dataset
or similar), first wrap it in dask.array.Array as shown here and then pass it
as normal and call as_lazy():
>>> importh5py>>> f=h5py.File("myfile.hdf5")>>> data=f['/data/path']Wrap the data in dask and chunk as appropriate>>> importdask.arrayasda>>> x=da.from_array(data,chunks=(1000,100))Create the lazy signal>>> s=hs.signals.Signal1D(x).as_lazy()
Loading the dataset in the original unsigned integer format would require
around 35GB of memory. To store it in a floating-point format one would need
almost 280GB of memory. However, with the lazy processing both of these steps
are near-instantaneous and require very little computational resources.
Currently when loading an hdf5 file lazily the file remains open at
least while the signal exists. In order to close it explicitly, use the
close_file() method. Alternatively,
you could close it on calling compute()
by passing the keyword argument close_file=True e.g.:
>>> s=hs.load("file.hspy",lazy=True)>>> ssum=s.sum(axis=0)Close the file>>> ssum.compute(close_file=True)
Occasionally the full dataset consists of many smaller files. To combine them
into a one large LazySignal, we can stack them
lazily (both when loading or afterwards):
>>> siglist=hs.load("*.hdf5")>>> s=hs.stack(siglist,lazy=True)Or load lazily and stack afterwards:>>> siglist=hs.load("*.hdf5",lazy=True)Make a stack, no need to pass 'lazy', as signals are already lazy>>> s=hs.stack(siglist)Or do everything in one go:>>> s=hs.load("*.hdf5",lazy=True,stack=True)
Although most of them work as described, their operation may not always
be optimal, well-documented and/or consistent with their in-memory counterparts.
Decomposition algorithms for machine learning often perform
large matrix manipulations, requiring significantly more memory than the data size.
To perform decomposition operation lazily, HyperSpy provides access to several “online”
algorithms as well as dask’s lazy SVD algorithm.
Online algorithms perform the decomposition by operating serially on chunks of
data, enabling the lazy decomposition of large datasets. In line with the
standard HyperSpy signals, lazy decomposition()
offers the following online algorithms:
Available lazy decomposition algorithms in HyperSpy#
The default signal navigator is the sum of the signal across all signal
dimensions and all but 1 or 2 navigation dimensions. If the dataset is large,
this can take a significant amount of time to perform with every plot.
By default, a navigator is computed with minimally required approach to obtain
a good signal-to-noise ratio image: the sum is taken on a single chunk of the
signal space, in order to avoid to compute the navigator for the whole dataset.
In the following example, the signal space is divided in 25 chunks (5 along on
each axis), and therefore computing the navigation will only be perfomed over
a small subset of the whole dataset by taking the sum on only 1 chunk out
of 25:
In the example above, the calculation of the navigation is fast but the actual
visualisation of the dataset is slow, each for each navigation index change,
25 chunks of the dataset needs to be fetched from the harddrive. In the
following example, the signal space contains a single chunk (instead of 25, in
the previous example) and the calculating the navigator will then be slower (~20x)
because the whole dataset will need to processed, however in this case, the
visualisation will be faster, because only a single chunk will fetched from the
harddrive when changing navigation indices:
This approach depends heavily on the chunking of the data and may not be
always suitable. The compute_navigator()
can be used to calculate the navigator efficient and store the navigator, so
that it can be used when plotting and saved for the later loading of the dataset.
The compute_navigator() has optional
argument to specify the index where the sum needs to be calculated and how to
rechunk the dataset when calculating the navigator. This allows to
efficiently calculate the navigator without changing the actual chunking of the
dataset, since the rechunking only takes during the computation of the navigator:
Lazy data processing on GPUs requires explicitly transferring the data to the
GPU.
On linux, it is recommended to use the
dask_cuda library
(not supported on windows) to manage the dask scheduler. As for CPU lazy
processing, if the dask scheduler is not specified, the default scheduler
will be used.
Most curve-fitting functionality will automatically work on models created from
lazily loaded signals. HyperSpy extracts the relevant chunk from the signal and fits to that.
The linear 'lstsq' optimizer supports fitting the entire dataset in a vectorised manner
using dask.array.linalg.lstsq(). This can give potentially enormous performance benefits over fitting
with a nonlinear optimizer, but comes with the restrictions explained in the linear fitting section.
Data saved in the HDF5 format is typically divided into smaller chunks which can be loaded separately into memory,
allowing lazy loading. Chunk size can dramatically affect the speed of various HyperSpy algorithms, so chunk size is
worth careful consideration when saving a signal. HyperSpy’s default chunking sizes are probably not optimal
for a given data analysis technique. For more comprehensible documentation on chunking,
see the dask array chunks and best practices docs. The chunks saved into HDF5 will
match the dask array chunks in s.data.chunks when lazy loading.
Chunk shape should follow the axes order of the numpy shape (s.data.shape), not the hyperspy shape.
The following example shows how to chunk one of the two navigation dimensions into smaller chunks:
>>> importdask.arrayasda>>> data=da.random.random((10,200,300))>>> data.chunksize(10, 200, 300)>>> s=hs.signals.Signal1D(data).as_lazy()Note the reversed order of navigation dimensions>>> s<LazySignal1D, title: , dimensions: (200, 10|300)>Save data with chunking first hyperspy dimension (second array dimension)>>> s.save('chunked_signal.zspy',chunks=(10,100,300))>>> s2=hs.load('chunked_signal.zspy',lazy=True)>>> s2.data.chunksize(10, 100, 300)
To get the chunk size of given axes, the get_chunk_size()
method can be used:
>>> importdask.arrayasda>>> data=da.random.random((10,200,300))>>> data.chunksize(10, 200, 300)>>> s=hs.signals.Signal1D(data).as_lazy()>>> s.get_chunk_size()# All navigation axes((10,), (200,))>>> s.get_chunk_size(0)# The first navigation axis((200,),)
Added in version 2.0.0.
Starting in version 2.0.0 HyperSpy does not automatically rechunk datasets as
this can lead to reduced performance. The rechunk or optimize keyword argument
can be set to True to let HyperSpy automatically change the chunking which
could potentially speed up operations.
Added in version 1.7.0.
For more recent versions of dask (dask>2021.11) when using hyperspy in a jupyter
notebook a helpful html representation is available.
>>> importdask.arrayasda>>> data=da.zeros((20,20,10,10,10))>>> s=hs.signals.Signal2D(data).as_lazy()>>> s
This helps to visualize the chunk structure and identify axes where the chunk spans the entire
axis (bolded axes).
When using lazy signals the computation of the data is delayed until
requested. However, the changes to the axes properties are performed
when running a given function that modfies them i.e. they are not
performed lazily. This can lead to hard to debug issues when the result
of a given function that is computed lazily depends on the value of the
axes parameters that may have changed before the computation is requested.
Therefore, in order to avoid such issues, it is reccomended to explicitly
compute the result of all functions that are affected by the axes
parameters. This is the reason why e.g. the result of
shift1D() is not lazy.
Dask is a flexible library for parallel computing in Python. All of the lazy operations in
hyperspy run through dask. Dask can be used to run computations on a single machine or
scaled to a cluster. The following example shows how to use dask to run computations on a
variety of different hardware:
The single threaded scheduler in dask is useful for debugging and testing. It is not
recommended for general use.
>>> importdask>>> importhyperspy.apiashs>>> importnumpyasnp>>> importdask.arrayasdaSet the scheduler to single-threaded globally>>> dask.config.set(scheduler='single-threaded')
Alternatively, you can set the scheduler to single-threaded for a single function call by
setting the scheduler keyword argument to 'single-threaded'.
Or for something like plotting you can set the scheduler to single-threaded for the
duration of the plotting call by using the withdask.config.set context manager.
Dask has two schedulers available for single machines.
Threaded Scheduler:
Fastest to set up but only provides parallelism through threads so only non python functions will be parallelized.
This is good if you have largely numpy code and not too many cores.
Processes Scheduler:
Each task (and all of the necessary dependencies) are shipped to different processes. As such it has a larger set
up time. This preforms well for python dominated code.
>>> importdask>>> dask.config.set(scheduler='processes')Any hyperspy code will now use the multiprocessing scheduler>>> s.compute()Change to threaded Scheduler, overwrite default>>> dask.config.set(scheduler='threads')>>> s.compute()
Distributed computing is not supported for all file formats.
Distributed computing is limited to a few file formats, see the list of
supported file format in
RosettaSciIO documentation.
The recommended way to use dask is with the distributed scheduler. This allows you to scale your computations
to a cluster of machines. The distributed scheduler can be used on a single machine as well. dask-distributed
also gives you access to the dask dashboard which allows you to monitor your computations.
Some operations such as the matrix decomposition algorithms in hyperspy don’t currently work with
the distributed scheduler.
>>> fromdask.distributedimportClient>>> fromdask.distributedimportLocalCluster>>> importdask.arrayasda>>> importhyperspy.apiashs>>> cluster=LocalCluster()>>> client=Client(cluster)>>> clientAny calculation will now use the distributed scheduler>>> s>>> s.plot()>>> s.compute()
Running computation on remote cluster can be done easily using dask_jobqueue
An important limitation when using LazySignal is the inability to modify
existing data (immutability). This is a logical consequence of the DAG (tree
structure, explained in Behind the scenes – technical details), where a complete history of the
processing has to be stored to traverse later.
In fact, lazy evaluation removes the need for such operation, since only
additional tree branches are added, requiring very little resources. In
practical terms the following fails with lazy signals:
>>> s=hs.signals.BaseSignal([0]).as_lazy()>>> s+=1Traceback (most recent call last):
File "<ipython-input-6-1bd1db4187be>", line 1, in <module>s+=1
File "<string>", line 2, in __iadd__
File "/home/fjd29/Python/hyperspy3/hyperspy/signal.py", line 1591, in _binary_operator_rulergetattr(self.data,op_name)(other)AttributeError: 'Array' object has no attribute '__iadd__'
However, when operating lazily there is no clear benefit to using in-place
operations. So, the operation above could be rewritten as follows:
Histograms for a LazySignal do not support knuth and blocks
binning algorithms.
CircleROI sets the elements outside the ROI to np.nan instead of
using a masked array, because dask does not support masking. As a
convenience, nansum, nanmean and other nan* signal methods were
added to mimic the workflow as closely as possible.
The most efficient format supported by HyperSpy to write data is the
ZSpy format,
mainly because it supports writing concurrently from multiple threads or processes.
This also allows for smooth interaction with dask-distributed for efficient scaling.
Standard HyperSpy signals load the data into memory for fast access and
processing. While this behaviour gives good performance in terms of speed, it
obviously requires at least as much computer memory as the dataset, and often
twice that to store the results of subsequent computations. This can become a
significant problem when processing very large datasets on consumer-oriented
hardware.
HyperSpy offers a solution for this problem by including
LazySignal and its derivatives. The main idea of
these classes is to perform any operation (as the name suggests)
lazily (delaying the
execution until the result is requested (e.g. saved, plotted)) and in a
blocked fashion. This is
achieved by building a “history tree” (formally called a Directed Acyclic Graph
(DAG)) of the computations, where the original data is at the root, and any
further operations branch from it. Only when a certain branch result is
requested, the way to the root is found and evaluated in the correct sequence
on the correct blocks.
The “magic” is performed by (for the sake of simplicity) storing the data not
as numpy.ndarray, but dask.array.Array (see the
dask documentation). dask
offers a couple of advantages:
Arbitrary-sized data processing is possible. By only loading a couple of
chunks at a time, theoretically any signal can be processed, albeit slower.
In practice, this may be limited: (i) some operations may require certain
chunking pattern, which may still saturate memory; (ii) many chunks should
fit into the computer memory comfortably at the same time.
Loading only the required data. If a certain part (chunk) of the data is
not required for the final result, it will not be loaded at all, saving time
and resources.
Able to extend to a distributed computing environment (clusters).
:dask.distributed (see
the dask documentation) offers
a straightforward way to expand the effective memory for computations to that
of a cluster, which allows performing the operations significantly faster
than on a single machine.
ROIs can be defined to select part of any compatible signal and may be applied
either to the navigation or to the signal axes. A number of different ROIs are
available:
ROIs can also be used interactively with widgets.
The following example shows how to interactively apply ROIs to an image. Note
that it is necessary to plot the signal onto which the widgets will be
added before calling interactive().
Depending on your screen and display settings, it can be difficult to pick
or manipulate widgets and you can try to change the pick tolerance in
the HyperSpy plot preferences.
Typically, using a 4K resolution with a small scaling factor (<150 %), setting
the pick tolerance to 15 instead of 7.5 makes the widgets easier to manipulate.
If instantiated without arguments, (i.e. rect=RectangularROI() the roi
will automatically determine sensible values to center it when
interactively adding it to a signal. This provides a conventient starting point
to further manipulate the ROI, either by hand or using the gui (i.e. rect.gui).
Notably, since ROIs are independent from the signals they sub-select, the widget
can be plotted on a different signal altogether.
>>> importscipy>>> im=hs.signals.Signal2D(scipy.datasets.ascent())>>> s=hs.signals.Signal1D(np.random.rand(512,512,512))>>> roi=hs.roi.RectangularROI(left=30,right=77,top=20,bottom=50)>>> s.plot()# plot signal to have where to display the widget>>> imr=roi.interactive(im,navigation_signal=s,color="red")>>> roi(im).plot()
ROIs are implemented in terms of physical coordinates and not pixels, so with
proper calibration will always point to the same region.
And of course, as all interactive operations, interactive ROIs are chainable.
The following example shows how to display interactively the histogram of a
rectangular ROI. Notice how we customise the default event connections in
order to increase responsiveness.
>>> importscipy>>> im=hs.signals.Signal2D(scipy.datasets.ascent())>>> im.plot()>>> roi=hs.roi.RectangularROI(left=30,right=500,top=200,bottom=400)>>> im_roi=roi.interactive(im,color="red")>>> roi_hist=hs.interactive(im_roi.get_histogram,... event=roi.events.changed,... bins=150,# Set number of bins for `get_histogram`... recompute_out_event=None)>>> roi_hist.plot()
Added in version 1.3: ROIs can be used in place of slices when indexing and to define a
signal range in functions taken a signal_range argument.
All ROIs have a gui method that displays an user interface if
a hyperspy GUI is installed (currently only works with the
hyperspy_gui_ipywidgets GUI), enabling precise control of the ROI
parameters:
>>> # continuing from above:>>> roi.gui()
Added in version 1.4: angle() can be used to calculate an angle between
ROI line and one of the axes providing its name through optional argument axis:
Added in version 1.3: gui method added, for example gui().
Added in version 1.6: New __getitem__ method for all ROIs.
In addition, all ROIs have a __getitem__ method that enables
using them in place of tuples.
For example, the method align2D() takes a roi
argument with the left, right, top, bottom coordinates of the ROI.
Handily, we can pass a RectangularROI ROI instead.
plot_roi_map() is a function that allows you to
interactively visualize the spatial variation of intensity in a Signal
within a ROI of its signal axes. In other words, it shows maps of
the integrated signal for custom ranges along the signal axis.
To allow selection of the signal ROIs, a plot of the mean signal over all
spatial positions is generated. Interactive ROIs can then be adjusted to the
desired regions within this plot.
For each ROI, a plot reflecting how the intensity of signal within this ROI
varies over the spatial dimensions of the Signal object is also plotted.
For Signal objects with 1 signal dimension SpanROIs are used
and for 2 signal dimensions, RectangularROIs are used.
In the example below, for a hyperspectral map with 2 navigation dimensions and
1 signal dimension (i.e. a spectrum at each position in a 2D map),
SpanROIs are used to select spectral regions of interest.
For each spectral region of interest a plot is generated displaying the
intensity within this region at each position in the map.
Events are a mechanism to send notifications. HyperSpy events are
decentralised, meaning that there is not a central events dispatcher.
Instead, each object that can emit events has an events
attribute that is an instance of Events and that contains
instances of Event as attributes. When triggered the
first keyword argument, obj contains the object that the events belongs to.
Different events may be triggered by other keyword arguments too.
It is possible to select the keyword arguments that are passed to the
connected. For example, in the following only the index keyword argument is
passed to on_index_changed2 and none to on_index_changed3:
Although usually there is no need to trigger events manually, there are
cases where it is required. When triggering events manually it is important
to pass the right keywords as specified in the event docstring. In the
following example we change the data attribute of a
BaseSignal manually and we then trigger the data_changed
event.
The function interactive() simplifies the definition of
operations that are automatically updated when an event is triggered. By
default the operation is recomputed when the data or the axes of the original
signal is changed.
P. Burdet, J. Vannod, A. Hessler-Wyser,
M. Rappaz, and M. Cantoni, “Three-dimensional chemical analysis of
laser-welded NiTi–stainless steel wires using a dual-beam FIB,”
Acta Materialia 61 (2013): 3090–3098
[https://doi.org/10.1016/j.actamat.2013.01.069].
M. Herráez, D. Burton, M. Lalor, and M. Gdeisat,
“Fast two-dimensional phase-unwrapping algorithm based on sorting by
reliability following a noncontinuous path,” Applied Optics, 41(35)
(2002): 7437-7444 [https://doi.org/10.1364/AO.41.007437].
M. Keenan and P. Kotula, “Accounting for Poisson noise in
the multivariate analysis of ToF-SIMS spectrum images,” Surf.
Interface Anal 36(3) (2004): 203–212
[https://onlinelibrary.wiley.com/doi/10.1002/sia.1657].
O. Nicoletti, F. de la Peña, R. Leary,
D. Holland, C. Ducati, and P. Midgley, “Three-dimensional imaging of
localized surface plasmon resonances of metal nanoparticles,” Nature 502
(2013): 80-84 [https://doi.org/10.1038/nature12469].
F. de la Peña, M.-H. Berger, J.-F. Hochepid,
F. Dynys, O. Stephan, and M. Walls, “Mapping titanium and tin oxide phases
using EELS: An application of independent component analysis,”
Ultramicroscopy 111 (2010): 169–176
[https://doi.org/10.1016/j.ultramic.2010.10.001].
Manuel Guizar-Sicairos, Samuel T. Thurman, and James R. Fienup,
“Efficient subpixel image registration algorithms”,
Optics Letters 33, 156-158 (2008). DOI:10.1364/OL.33.000156
[https://doi.org/10.1364/OL.33.000156].
Ville Satopää, Jeannie Albrecht, David Irwin, Barath Raghavan.
“Finding a “Kneedle” in a Haystack: Detecting Knee Points in System Behavior.
31st International Conference on Distributed Computing Systems Workshops”,
pp. 166-171, Minneapolis, Minnesota, USA, June 2011
[https://doi.org/10.1109/ICDCSW.2011.20].
Iakoubovskii, K., K. Mitsuishi, Y. Nakayama, and K. Furuya.
‘Thickness Measurements with Electron Energy Loss Spectroscopy’.
Microscopy Research and Technique 71, no. 8 (2008): 626–31.
[https://onlinelibrary.wiley.com/doi/10.1002/jemt.20597].
D. Rossouw, P. Burdet, F. de la Peña,
C. Ducati, B. Knappett, A. Wheatley, and P. Midgley, “Multicomponent Signal
Unmixing from Nanoheterostructures: Overcoming the Traditional Challenges of
Nanoscale X-ray Analysis via Machine Learning,” Nano Lett. 15(4) (2015):
2716–2720 [https://doi.org/10.1021/acs.nanolett.5b00449].
S. Zaefferer, “New developments of computer-aided crystallographic
analysis in transmission electron microscopy” J. Appl. Crystallogr.,
vol. 33, no. v, pp. 10–25, 2000.
[https://doi.org/10.1107/S0021889899010894].
As for ragged signals, the number of markers at each position can vary and this
is done by passing a ragged array to the constructor of the markers.
Create a signal
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.arange(25*100*100).reshape((25,100,100))s=hs.signals.Signal2D(data)
Create the ragged array with varying number of markers for each navigation
position
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((50,100,100))s=hs.signals.Signal2D(data)foraxisins.axes_manager.signal_axes:axis.scale=2*np.pi/100
This example shows how to draw arrows
# Define the position of the arrowsX,Y=np.meshgrid(np.arange(0,2*np.pi,.2),np.arange(0,2*np.pi,.2))offsets=np.column_stack((X.ravel(),Y.ravel()))U=np.cos(X).ravel()/7.5V=np.sin(Y).ravel()/7.5C=np.hypot(U,V)m=hs.plot.markers.Arrows(offsets,U,V,C=C)s.plot()s.add_marker(m)
sphinx_gallery_thumbnail_number = 2
Total running time of the script: (0 minutes 0.880 seconds)
importhyperspy.apiashsimportmatplotlib.pyplotaspltimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=rng.random((25,25,100))s=hs.signals.Signal1D(data)
This first example shows how to draw 3 static (same position for all
navigation coordinate) vetical lines
This example shows how to draw dynamic lines markers, whose positions and
numbers depends on the navigation coordinates
offsets=np.empty(s.axes_manager.navigation_shape,dtype=object)forindinnp.ndindex(offsets.shape):offsets[ind]=rng.random(rng.integers(10))*100# Get list of colorscolors=list(plt.rcParams['axes.prop_cycle'].by_key()['color'])m=hs.plot.markers.VerticalLines(offsets=offsets,linewidth=5,colors=colors,)s.plot()s.add_marker(m)
sphinx_gallery_thumbnail_number = 2
Total running time of the script: (0 minutes 1.367 seconds)
This example shows how to draw circle with the color of the circle scaling with
the radius of the circle
Create a signal
importhyperspy.apiashsimportmatplotlib.pyplotaspltimportnumpyasnp# Create a Signal2Drng=np.random.default_rng(0)s=hs.signals.Signal2D(np.ones((25,100,100)))
This first example shows how to draw arrows
# Define the size of the circlessizes=rng.random((10,))*20+5# Define the position of the circlesoffsets=rng.random((10,2))*100m=hs.plot.markers.Circles(sizes=sizes,offsets=offsets,linewidth=2,)s.plot()s.add_marker(m)
Note
Any changes to the marker made by setting matplotlib.collections.Collection
attributes will not be saved when saving as hspy/zspy file.
# Set the color of the circlesm.set_ScalarMappable_array(sizes.ravel()/2)# Add corresponding colorbarcbar=m.plot_colorbar()cbar.set_label('Circle radius')# Set animated state of colorbar to support blittinganimated=plt.gcf().canvas.supports_blitcbar.ax.yaxis.set_animated(animated)cbar.solids.set_animated(animated)
sphinx_gallery_thumbnail_number =
Total running time of the script: (0 minutes 0.668 seconds)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 1 navigation dimensionrng=np.random.default_rng(0)data=np.ones((10,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static Text markers
# Define the position of the textsoffsets=np.stack([np.arange(0,100,10)]*2).T+np.array([5,]*2)texts=np.array(['a','b','c','d','e','f','g','f','h','i'])m=hs.plot.markers.Texts(offsets=offsets,texts=texts,sizes=3,facecolor="black",)s.plot()s.add_marker(m)
importhyperspy.apiashsimportmatplotlib.pyplotaspltimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((25,25,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static stars markers using the matplotlib
StarPolygonCollection
# Define the position of the lines# line0: (x0, y0), (x1, y1)# line1: (x0, y0), (x1, y1)# ...segments=rng.random((10,2,2))*100m=hs.plot.markers.Lines(segments=segments,linewidth=3,colors='g',)s.plot()s.add_marker(m)
This first example shows how to draw dynamic lines markers, whose position
depends on the navigation coordinates
segments=np.empty(s.axes_manager.navigation_shape,dtype=object)forindinnp.ndindex(segments.shape):segments[ind]=rng.random((10,2,2))*100# Get list of colorscolors=list(plt.rcParams['axes.prop_cycle'].by_key()['color'])m=hs.plot.markers.Lines(segments=segments,colors=colors,linewidth=5,)s.plot()s.add_marker(m)
sphinx_gallery_thumbnail_number = 2
Total running time of the script: (0 minutes 1.539 seconds)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((10,100,100))s=hs.signals.Signal2D(data)foraxisins.axes_manager.signal_axes:axis.scale=2*np.pi/100# Select navigation position 5s.axes_manager.indices=(5,)
This example shows how to use the Arrows marker with a varying number of
arrows per navigation position
# Define the position of the arrows, use ragged array to enable the navigation# position dependenceoffsets=np.empty(s.axes_manager.navigation_shape,dtype=object)U=np.empty(s.axes_manager.navigation_shape,dtype=object)V=np.empty(s.axes_manager.navigation_shape,dtype=object)forindinnp.ndindex(U.shape):offsets[ind]=rng.random((ind[0]+1,2))*6U[ind]=rng.random(ind[0]+1)*2V[ind]=rng.random(ind[0]+1)*2m=hs.plot.markers.Arrows(offsets,U,V,)s.plot()s.add_marker(m)
sphinx_gallery_thumbnail_number = 2
Total running time of the script: (0 minutes 0.801 seconds)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 1 navigation dimensionrng=np.random.default_rng(0)data=np.ones((50,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static circles
# Define the position of the circles (start at (0, 0) and increment by 10)offsets=np.array([np.arange(0,100,10)]*2).Tm=hs.plot.markers.Circles(sizes=10,offsets=offsets,edgecolor='r',linewidth=5,)s.plot()s.add_marker(m)
importhyperspy.apiashsimportmatplotlibasmplimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((25,25,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static filled circles
# Define the position of the circlesoffsets=rng.random((10,2))*100m=hs.plot.markers.Points(sizes=20,offsets=offsets,facecolors="red",)s.plot()s.add_marker(m)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2Ddata=np.ones([100,100])s=hs.signals.Signal2D(data)num=2angle=25color=["tab:orange","tab:blue"]
Create the markers, the first and second elements are at 0 and 20 degrees
# Define the position of the markersoffsets=np.array([20*np.ones(num)]*2).Tangles=np.arange(0,angle*num,angle)m1=hs.plot.markers.Rectangles(offsets=offsets,widths=np.ones(num)*20,heights=np.ones(num)*10,angles=angles,facecolor='none',edgecolor=color,)m2=hs.plot.markers.Ellipses(offsets=offsets+np.array([0,20]),widths=np.ones(num)*20,heights=np.ones(num)*10,angles=angles,facecolor='none',edgecolor=color,)m3=hs.plot.markers.Squares(offsets=offsets+np.array([0,50]),widths=np.ones(num)*20,angles=angles,facecolor='none',edgecolor=color,)
Plot the signals and add all the markers
s.plot()s.add_marker([m1,m2,m3])
sphinx_gallery_thumbnail_number = 1
Total running time of the script: (0 minutes 0.352 seconds)
This example shows how to add or remove marker from an existing collection.
This is done by setting the parameters (offsets, sizes, etc.) of the collection.
Create a signal
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.arange(15*100*100).reshape((15,100,100))s=hs.signals.Signal2D(data)
Create text marker
# Define the position of the textsoffsets=np.stack([np.arange(0,100,10)]*2).T+np.array([5,]*2)texts=np.array(['a','b','c','d','e','f','g','f','h','i'])m=hs.plot.markers.Texts(offsets=offsets,texts=texts,sizes=3,)print(f'Number of markers is {len(m)}.')s.plot()s.add_marker(m)
# Set new texts and offsets parameters with one less itemm.remove_items(indices=-1)print(f'Number of markers is {len(m)} after removing one marker.')s.plot()s.add_marker(m)
# Define the position in the middle of the axesm.add_items(offsets=np.array([[50,50]]),texts=np.array(["new text"]))print(f'Number of markers is {len(m)} after adding the text {texts[-1]}.')s.plot()s.add_marker(m)
Number of markers is 10 after adding the text i.
sphinx_gallery_thumbnail_number = 2
Total running time of the script: (0 minutes 1.949 seconds)
importhyperspy.apiashsimportmatplotlib.pyplotaspltimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((25,25,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static polygon markers using the matplotlib
PolygonCollection
# Define the vertexes of the polygons# poylgon1: [[x0, y0], [x1, y1], [x2, y2], [x3, x3]]# poylgon2: [[x0, y0], [x1, y1], [x2, y2]]# ...poylgon1=[[1,1],[20,20],[1,20],[25,5]]poylgon2=[[50,60],[90,40],[60,40],[23,60]]verts=[poylgon1,poylgon2]m=hs.plot.markers.Polygons(verts=verts,linewidth=3,facecolors=('g',),)s.plot()s.add_marker(m)
This example shows how to draw dynamic polygon markers, whose position
depends on the navigation coordinates
verts=np.empty(s.axes_manager.navigation_shape,dtype=object)forindinnp.ndindex(verts.shape):verts[ind]=rng.random((10,4,2))*100# Get list of colorscolors=list(plt.rcParams['axes.prop_cycle'].by_key()['color'])m=hs.plot.markers.Polygons(verts=verts,facecolors=colors,linewidth=3,alpha=0.6)s.plot()s.add_marker(m)
sphinx_gallery_thumbnail_number = 2
Total running time of the script: (0 minutes 1.497 seconds)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((25,25,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static ellipses
# Define the position of the ellipsesoffsets=rng.random((10,2))*100m=hs.plot.markers.Ellipses(widths=(8,),heights=(10,),angles=(45,),offsets=offsets,facecolor="red",)s.plot()s.add_marker(m)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((25,25,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static square markers
# Define the position of the squares (start at (0, 0) and increment by 10)offsets=np.array([np.arange(0,100,10)]*2).Tm=hs.plot.markers.Squares(offsets=offsets,widths=(5,),angles=(0,),color="orange",)s.plot()s.add_marker(m)
importhyperspy.apiashsimportmatplotlibasmplimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((25,25,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static stars markers using the matplotlib
StarPolygonCollection
# Define the position of the boxesoffsets=rng.random((10,2))*100# every other star has a size of 50/100m=hs.plot.markers.Markers(collection=mpl.collections.StarPolygonCollection,offsets=offsets,numsides=5,color="orange",sizes=(50,100))s.plot()s.add_marker(m)
This second example shows how to draw dynamic stars markers, whose position
depends on the navigation coordinates
# Create a Signal2D with 2 navigation dimensionss2=hs.signals.Signal2D(data)# Create a ragged array of offsetsoffsets=np.empty(s.axes_manager.navigation_shape,dtype=object)forindinnp.ndindex(offsets.shape):offsets[ind]=rng.random((10,2))*100m2=hs.plot.markers.Markers(collection=mpl.collections.StarPolygonCollection,offsets=offsets,numsides=5,color="blue",sizes=(50,100))s2.plot()s2.add_marker(m2)
sphinx_gallery_thumbnail_number = 2
Total running time of the script: (0 minutes 1.494 seconds)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((25,25,100,100))s=hs.signals.Signal2D(data)
This first example shows how to draw static rectangle markers
# Define the position of the rectanglesoffsets=np.array([np.arange(0,100,10)]*2).Tm=hs.plot.markers.Rectangles(offsets=offsets,widths=(5,),heights=(7,),angles=(0,),color="red",)s.plot()s.add_marker(m)
importhyperspy.apiashsimportnumpyasnp# Create a Signal2D with 2 navigation dimensionsrng=np.random.default_rng(0)data=np.ones((50,100,100))s=hs.signals.Signal2D(data)foraxisins.axes_manager.signal_axes:axis.scale=2*np.pi/100
This example shows how to create markers from a signal. This is useful for creating lazy
markers from some operation such as peak finding on a signal. Here we show how to create
markers from a simple map function which finds the maximum value and plots a marker at
that position.
importnumpyasnpimporthyperspy.apiashs# Making some artificial datadeffind_maxima(data,scale,offset):ind=np.array(np.unravel_index(np.argmax(data,axis=None),data.shape)).astype(int)d=data[ind]ind=ind*scale+offset# convert to physical unitsprint(ind)print(d)returnnp.array([[ind[0],d[0]],])deffind_maxima_lines(data,scale,offset):ind=np.array(np.unravel_index(np.argmax(data,axis=None),data.shape)).astype(int)ind=ind*scale+offset# convert to physical unitsreturninddefgaussian(x,mu,sig):return(1.0/(np.sqrt(2.0*np.pi)*sig)*np.exp(-np.power((x-mu)/sig,2.0)/2))data=np.empty((4,120))foriinrange(4):x_values=np.linspace(-3+i*0.1,3+i*0.1,120)data[i]=gaussian(x_values,mu=0,sig=10)s=hs.signals.Signal1D(data)s.axes_manager.signal_axes[0].scale=6/120s.axes_manager.signal_axes[0].offset=-3scale=s.axes_manager.signal_axes[0].scaleoffset=s.axes_manager.signal_axes[0].offsetmax_values=s.map(find_maxima,scale=scale,offset=offset,inplace=False,ragged=True)max_values_lines=s.map(find_maxima_lines,scale=scale,offset=offset,inplace=False,ragged=True)point_markers=hs.plot.markers.Points.from_signal(max_values,signal_axes=None)line_markers=hs.plot.markers.VerticalLines.from_signal(max_values_lines,signal_axes=None)s.plot()s.add_marker(point_markers)s.add_marker(line_markers)
Total running time of the script: (0 minutes 0.871 seconds)
The first example shows how to draw markers which are relative to some
1D signal. This is how the EDS and EELS Lines are implemented in the
exspy package.
segments=np.zeros((10,2,2))# line segemnts for realative markerssegments[:,1,1]=1# set y values end (1 means to the signal curve)segments[:,0,0]=np.arange(10).reshape(10)# set x for line startsegments[:,1,0]=np.arange(10).reshape(10)# set x for line stopoffsets=np.zeros((10,2))# offsets for texts positionsoffsets[:,1]=1# set y value for text position ((1 means to the signal curve))offsets[:,0]=np.arange(10).reshape(10)# set x for line startmarkers=hs.plot.markers.Lines(segments=segments,transform="relative")texts=hs.plot.markers.Texts(offsets=offsets,texts=["a","b","c","d","e","f","g","h","i"],sizes=10,offset_transform="relative",shift=0.005)# shift in axes units for some constant displacementsignal.plot()signal.add_marker(markers)signal.add_marker(texts)
The second example shows how to draw markers which extend to the edges of the
axes. This is how the VerticalLines and HorizontalLines markers are implemented.
This example creates a signal from tabular data imported from a txt file using
numpy.loadtxt(). The signal axis and the EELS intensity values are
given by the first and second columns, respectively.
Define the axes of the signal and then create the signal:
axes=[# use values from first column to define non-uniform signal axisdict(axis=x,name="Energy",units="eV"),]s=hs.signals.Signal1D(y,axes=axes)
Convert the non-uniform axis to a uniform axis, because non-uniform axes do not
support all functionalities of HyperSpy.
In this case, the error introduced during conversion to uniform scale is negligeable.
This example creates a line spectrum and plots it.
importnumpyasnpimporthyperspy.apiashs# Create a line spectrum with random datas=hs.signals.Signal1D(np.random.random((100,1024)))# Define the axis propertiess.axes_manager.signal_axes[0].name='Energy's.axes_manager.signal_axes[0].units='eV's.axes_manager.signal_axes[0].scale=0.3s.axes_manager.signal_axes[0].offset=100s.axes_manager.navigation_axes[0].name='time's.axes_manager.navigation_axes[0].units='fs's.axes_manager.navigation_axes[0].scale=0.3s.axes_manager.navigation_axes[0].offset=100# Give a titles.metadata.General.title='Random line spectrum'# Plot its.plot()
Total running time of the script: (0 minutes 0.709 seconds)
This example creates a signal from tabular data, where the signal axis is given by an array
of data values (the x column) and the tabular data are ordered in columns with 5 columns
containing each 20 values and each column corresponding to a position in the
navigation space (linescan).
Define the axes of the signal and then create the signal:
axes=[# length of the navigation axisdict(size=y.shape[1],scale=0.1,name="Position",units="nm"),# use values to define non-uniform axis for the signal axisdict(axis=x,name="Energy",units="eV"),]s=hs.signals.Signal1D(y.T,axes=axes)
Convert the non-uniform signal axis to a uniform axis, because non-uniform axes do not
support all functionalities of HyperSpy.
In this case, the error introduced during conversion to uniform scale is negligeable.
importnumpyasnpimporthyperspy.apiashs# Create an image stack with random dataim=hs.signals.Signal2D(np.random.random((16,32,32)))# Define the axis propertiesim.axes_manager.signal_axes[0].name='X'im.axes_manager.signal_axes[0].units='nm'im.axes_manager.signal_axes[0].scale=0.1im.axes_manager.signal_axes[0].offset=0im.axes_manager.signal_axes[1].name='Y'im.axes_manager.signal_axes[1].units='nm'im.axes_manager.signal_axes[1].scale=0.1im.axes_manager.signal_axes[1].offset=0im.axes_manager.navigation_axes[0].name='time'im.axes_manager.navigation_axes[0].units='fs'im.axes_manager.navigation_axes[0].scale=0.3im.axes_manager.navigation_axes[0].offset=100# Give a titleim.metadata.General.title='Random image stack'# Plot itim.plot()
Total running time of the script: (0 minutes 0.629 seconds)
This example creates a spectrum image, i.e. navigation dimension 2 and
signal dimension 1, and plots it.
importnumpyasnpimporthyperspy.apiashsimportmatplotlib.pyplotasplt# Create a spectrum image with random datas=hs.signals.Signal1D(np.random.random((64,64,1024)))# Define the axis propertiess.axes_manager.signal_axes[0].name='Energy's.axes_manager.signal_axes[0].units='eV's.axes_manager.signal_axes[0].scale=0.3s.axes_manager.signal_axes[0].offset=100s.axes_manager.navigation_axes[0].name='X's.axes_manager.navigation_axes[0].units='nm's.axes_manager.navigation_axes[0].scale=0.1s.axes_manager.navigation_axes[0].offset=100s.axes_manager.navigation_axes[1].name='Y's.axes_manager.navigation_axes[1].units='nm's.axes_manager.navigation_axes[1].scale=0.1s.axes_manager.navigation_axes[1].offset=100# Give a titles.metadata.General.title='Random spectrum image'# Plot its.plot()plt.show()# No necessary when running in the HyperSpy's IPython profile
Total running time of the script: (0 minutes 0.746 seconds)
This example creates a 4D dataset, i.e. 2 navigation dimension and
2 signal dimension and plots it.
importnumpyasnpimporthyperspy.apiashs# Create a 2D image stack with random dataim=hs.signals.Signal2D(np.random.random((16,16,32,32)))# Define the axis propertiesim.axes_manager.signal_axes[0].name=''im.axes_manager.signal_axes[0].units='1/nm'im.axes_manager.signal_axes[0].scale=0.1im.axes_manager.signal_axes[0].offset=0im.axes_manager.signal_axes[1].name=''im.axes_manager.signal_axes[1].units='1/nm'im.axes_manager.signal_axes[1].scale=0.1im.axes_manager.signal_axes[1].offset=0im.axes_manager.navigation_axes[0].name='X'im.axes_manager.navigation_axes[0].units='nm'im.axes_manager.navigation_axes[0].scale=0.3im.axes_manager.navigation_axes[0].offset=100im.axes_manager.navigation_axes[1].name='Y'im.axes_manager.navigation_axes[1].units='nm'im.axes_manager.navigation_axes[1].scale=0.3im.axes_manager.navigation_axes[1].offset=100# Give a titleim.metadata.General.title='Random 2D image stack'im.plot()
Total running time of the script: (0 minutes 0.605 seconds)
Creates a single spectrum image, saves it and plots it:
Create a single sprectrum using Signal1D signal.
Save signal as a msa file
Plot the signal using the plot method
Save the figure as a png file
# Set the matplotlib backend of your choice, for example# %matploltib qtimporthyperspy.apiashsimportnumpyasnps=hs.signals.Signal1D(np.random.rand(1024))# Export as msa file, very similar to a csv file but containing standardised# metadatas.save('testSpectrum.msa',overwrite=True)# Plot its.plot()
Total running time of the script: (0 minutes 0.349 seconds)
This example shows how to adjust the contrast and intensities using
scikit-image and save it as an RGB image.
When saving an RGB image to jpg, only 8 bits are supported and the image
intensity needs to be rescaled to 0-255 before converting to 8 bits,
otherwise, the intensities will be cropped at the value of 255.
In hyperspy, color images are defined as Signal1D with the signal dimension
corresponding to the color channel (red, green and blue)
# The dtype can be changed to a custom dtype, which is convenient to visualise# the color images=hs.signals.Signal1D(ski.data.astronaut())s.change_dtype("rgb8")print(s)
<Signal2D, title: , dimensions: (|512, 512)>
Display the color image
s.plot()
Processing is usually performed on standard dtype (e.g. uint8, uint16), because
most functions from scikit-image, numpy, scipy, etc. only support standard dtype.
Convert from RGB to unsigned integer 16 bits
To save a color image to jpg, the signal needs to be converted to rgb8 because
jpg only support 8-bit RGB
Rescale intensity to fit the unsigned integer 8 bits (2**8 = 256 intensity level)
Now that the values have been rescaled to the 0-255 range, we can convert the data type
to unsigned integer 8 bit and then rgb8 to be able to save the RGB image in jpg format
Model1D: Simple arctan fit
CurrentComponentValues: Arctan
Active: True
Parameter Name | Free | Value | Std | Min | Max | Linear
============== | ======= | ========== | ========== | ========== | ========== | ======
A | True | 1.00316968 | 0.00208975 | None | None | True
k | True | 1.04134983 | 0.08808298 | None | None | False
x0 | True | -0.0798035 | 0.07592704 | None | None | False
importnumpyasnpimporthyperspy.apiashs# Generate the data and make the spectrumdata=np.arctan(np.arange(-500,500))s=hs.signals.Signal1D(data)s.axes_manager[0].offset=-500s.axes_manager[0].units=""s.axes_manager[0].name="x"s.metadata.General.title="Simple arctan fit"s.set_signal_origin("simulation")s.add_gaussian_noise(0.1)# Make the arctan component for use in the modelarctan_component=hs.model.components1D.Arctan()# Create the model and add the arctan componentm=s.create_model()m.append(arctan_component)# Fit the arctan component to the spectrumm.fit()# Print the result of the fitm.print_current_values()# Plot the spectrum and the model fittingm.plot()
Total running time of the script: (0 minutes 0.546 seconds)
Create the roi, here a SpanROI for one dimensional ROI:
roi=hs.roi.SpanROI(left=10,right=20)
Slice signal with roi with the ROI. By using the interactive function, the
output signal s_roi will update automatically.
The ROI will be added automatically on the signal figure.
Specify the axes to add the ROI on either the navigation or signal dimension:
s.plot()sliced_signal=roi.interactive(s,axes=s.axes_manager.signal_axes)# Choose the second figure as gallery thumbnail:# sphinx_gallery_thumbnail_number = 2
Plot the signal sliced by the ROI and use autoscale='xv' to update the
limits of the plot automatically:
sliced_signal.plot(autoscale='xv')
Total running time of the script: (0 minutes 2.055 seconds)
Use a RectangularROI to take the sum of an area of the navigation space.
importhyperspy.apiashs
Create a signal:
s=hs.data.two_gaussians()
Create the roi, here a RectangularROI for the two dimension navigation space:
roi=hs.roi.RectangularROI()
Slice signal with roi with the ROI. By using the interactive function, the
output signal s_roi will update automatically.
The ROI will be added automatically on the signal figure.
By default, the ROI will be added to the navigation or signal. We specify
recompute_out_event=None to avoid redundant computation when changing the ROI
s.plot()s_roi=roi.interactive(s,recompute_out_event=None,color='C1')# We use :py:class:`~.interactive` function to compute the sum over the ROI interactively:roi_sum=hs.interactive(s_roi.sum,recompute_out_event=None)# Choose the second figure as gallery thumbnail:# sphinx_gallery_thumbnail_number = 1
Plot the signal sliced by the ROI:
roi_sum.plot()
Total running time of the script: (0 minutes 1.550 seconds)
Add 2 ROIs in signal space and map the corresponding signal using plot_roi_map().
The ROIs are added to the plot of the signal and by default a
RectangularROI is used
s.plot()roi=hs.plot.plot_roi_map(s,rois=2)
Same as above but with using CircleROI with predefined position:
roi1=hs.roi.CircleROI(cx=25,cy=25,r=5)roi2=hs.roi.CircleROI(cx=25,cy=25,r=15,r_inner=10)s.plot()roi=hs.plot.plot_roi_map(s,rois=[roi1,roi2])# Choose this figure as gallery thumbnail:# sphinx_gallery_thumbnail_number = 6
Total running time of the script: (0 minutes 3.353 seconds)
Interactively extract a line profile (with a certain width) from an image using
Line2DROI. Use plot_spectra() to plot several
line profiles on the same figure. Save a profile data as msa file.
Extracting line profiles and interactive plotting.#
Intialize Line-ROI from position (400,250) to position (220,600) of width 5
in calibrated axes units (in the current example equal to the image pixels):
line_roi=hs.roi.Line2DROI(400,250,220,600,5)
Extract data along the ROI as new signal by “slicing” the signal and plot the
profile:
profile=line_roi(im0)profile.plot()
Slicing of the signal is not interactive. If you want to modify the line along
which the profile is extracted, you can plot the image and display the ROI
interactively (creates a new signal object). You can even display the same ROI
on a second image to make sure that a profile is well placed on both images:
Interactive integration of one dimensional signal#
This example shows how to integrate a signal using an interactive ROI.
importhyperspy.apiashs
Create a signal:
s=hs.data.two_gaussians()
Create SpanROI:
roi=hs.roi.SpanROI(left=10,right=20)
Slice signal with roi with the ROI. By using the interactive function, the
output signal s_roi will update automatically.
The ROI will be added automatically on the signal figure:
s.plot()sliced_signal=roi.interactive(s,axes=s.axes_manager.signal_axes)# Choose the second figure as gallery thumbnail:# sphinx_gallery_thumbnail_number = 2
Create a placeholder signal for the integrated signal and set metadata:
Create the interactive computation, which will update when the ROI roi is
changed. wWe use the out argument to place the results of the integration
in the placeholder signal defined in the previous step:
Creates a 2D hyperspectrum consisting of two Gaussians and plots it.
This example can serve as starting point to test other functionalities on the
simulated hyperspectrum.
Exception ignored in: <function tqdm.__del__ at 0x7f4ece6e49a0>
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/hyperspy/envs/latest/lib/python3.11/site-packages/tqdm/std.py", line 1148, in __del__
self.close()
File "/home/docs/checkouts/readthedocs.org/user_builds/hyperspy/envs/latest/lib/python3.11/site-packages/tqdm/notebook.py", line 279, in close
self.disp(bar_style='danger', check_delay=False)
^^^^^^^^^
AttributeError: 'tqdm_notebook' object has no attribute 'disp'
0%| | 0/1024 [00:00<?, ?it/s]
22%|██▏ | 223/1024 [00:00<00:00, 2229.67it/s]
44%|████▍ | 449/1024 [00:00<00:00, 2246.89it/s]
66%|██████▌ | 676/1024 [00:00<00:00, 2256.12it/s]
88%|████████▊ | 902/1024 [00:00<00:00, 2256.53it/s]
100%|██████████| 1024/1024 [00:00<00:00, 2253.03it/s]
importnumpyasnpimporthyperspy.apiashsimportmatplotlib.pyplotasplt# Create an empty spectrums=hs.signals.Signal1D(np.zeros((32,32,1024)))# Generate some simple data: two Gaussians with random centers and area# First we create a modelm=s.create_model()# Define the first gaussiangs1=hs.model.components1D.Gaussian()# Add it to the modelm.append(gs1)# Set the parametersgs1.sigma.value=10# Make the center vary in the -5,5 range around 128gs1.centre.map['values'][:]=256+(np.random.random((32,32))-0.5)*10gs1.centre.map['is_set'][:]=True# Make the area vary between 0 and 10000gs1.A.map['values'][:]=10000*np.random.random((32,32))gs1.A.map['is_set'][:]=True# Second gaussiangs2=hs.model.components1D.Gaussian()# Add it to the modelm.append(gs2)# Set the parametersgs2.sigma.value=20# Make the center vary in the -10,10 range around 768gs2.centre.map['values'][:]=768+(np.random.random((32,32))-0.5)*20gs2.centre.map['is_set'][:]=True# Make the area vary between 0 and 20000gs2.A.map['values'][:]=20000*np.random.random((32,32))gs2.A.map['is_set'][:]=True# Create the datasets_model=m.as_signal()# Add noises_model.set_signal_origin("simulation")s_model.add_poissonian_noise()# Plot the results_model.plot()plt.show()
Total running time of the script: (0 minutes 1.276 seconds)
The BaseSignal class stores metadata in the
metadata attribute, which has a tree structure. By
convention, the node labels are capitalized and the leaves are not
capitalized.
When a leaf contains a quantity that is not dimensionless, the units can be
given in an extra leaf with the same label followed by the “_units” suffix.
For example, an “energy” leaf should be accompanied by an “energy_units” leaf.
The metadata structure is represented in the following tree diagram. The
default units are given in parentheses. Details about the leaves can be found
in the following sections of this chapter.
Contains information about the software packages and versions used any time the
Signal was created by reading the original data format (added in HyperSpy
v1.7) or saved by one of HyperSpy’s IO tools. If the signal is saved to one
of the hspy, zspy or nxs formats, the metadata within the FileIO
node will represent a history of the software configurations used when the
conversion was made from the proprietary/original format to HyperSpy’s
format, as well as any time the signal was subsequently loaded from and saved
to disk. Under the FileIO node will be one or more nodes named 0,
1, 2, etc., each with the following structure:
operation
type: Str
This value will be either "load" or "save" to indicate whether
this node represents a load from, or save to disk operation, respectively.
hyperspy_version
type: Str
The version number of the HyperSpy software used to extract a Signal from
this data file or save this Signal to disk
io_plugin
type: Str
The specific input/output plugin used to originally extract this data file
into a HyperSpy Signal or save it to disk – will be of the form
rsciio.<plugin_name>.
timestamp
type: Str
The timestamp of the computer running the data loading/saving process (in a
timezone-aware format). The timestamp will be in ISO 8601 format, as
produced by the datetime.date.isoformat().
A term that describes the signal type, e.g. EDS, PES… This information
can be used by HyperSpy to load the file as a specific signal class and
therefore the naming should be standardised. Currently, HyperSpy provides
special signal class for photoemission spectroscopy, electron energy
loss spectroscopy and energy dispersive spectroscopy. The signal_type in
these cases should be respectively PES, EELS and EDS_TEM (EDS_SEM).
signal_origin
type: Str
Describes the origin of the signal e.g. ‘simulation’ or ‘experiment’.
record_by
Deprecated since version 1.2.
type: Str
One of ‘spectrum’ or ‘image’. It describes how the data is stored in memory.
If ‘spectrum’, the spectral data is stored in the faster index.
quantity
type: Str
The name of the quantity of the “intensity axis” with the units in round
brackets if required, for example Temperature (K).
The variance of the data. It can be a float when the noise is Gaussian or a
BaseSignal instance if the noise is heteroscedastic,
in which case it must have the same dimensions as
data.
Given an item_path, returns True if the item exists anywhere
in the metadata tree.
Using the option full_path=False, the functions
has_item() and
get_item() can also find items by
their key in the metadata when the exact path is not known. By default, only
an exact match of the search string with the item key counts. The additional
setting wild=True allows to search for a case-insensitive substring of the
item key. The search functionality also accepts item keys preceded by one or
several nodes of the path (separated by the usual full stop).
For full_path=False, returns the value or list of values for any
matching item(s). Setting return_path=True, a tuple (value, path) is
returned – or lists of tuples for multiple occurences.
Transposes all passed signals according to the specified options.
All public packages, functions and classes are available in this module.
When starting HyperSpy using the hyperspy script (e.g. by executing
hyperspy in a console, using the context menu entries or using the links in
the StartMenu, the api package is imported in the user
namespace as hs, i.e. by executing the following:
>>> importhyperspy.apiashs
(Note that code snippets are indicated by three greater-than signs)
We recommend to import the HyperSpy API as above also when doing it manually.
The docstring examples assume that hyperspy.api has been imported as hs,
numpy as np and matplotlib.pyplot as plt.
Load data into BaseSignal instances from supported files.
preferences
Preferences class instance to configure the default value of different
parameters. It has a CLI and a GUI that can be started by execting its
gui method i.e. preferences.gui().
Signal classes which are the core of HyperSpy. Use this modules to
create Signal instances manually from numpy arrays. Note that to
load data from supported file formats is more convenient to use the
load function.
Update the result of the operation when the event is triggered.
If "auto" and f is a method of a Signal class instance its
data_changed event is selected if the function takes an out
argument. If None, update is not connected to any event. The
default is "auto". It is also possible to pass an iterable of
events, in which case all the events are connected.
Optional argument. If supplied, this event causes a full
recomputation of a new object. Both the data and axes of the new
object are then copied over to the existing out object. Only
useful for signals or other objects that have an attribute
axes_manager. If "auto" and f is a method of a Signal class
instance its AxesManagerany_axis_changed event is selected.
Otherwise, the signal data_changed event is selected.
If None, recompute_out is not connected to any event.
The default is "auto". It is also possible to pass an iterable of
events, in which case all the events are connected.
Load potentially multiple supported files into HyperSpy.
Supported formats: hspy (HDF5), msa, Gatan dm3, Ripple (rpl+raw),
Bruker bcf and spx, FEI ser and emi, SEMPER unf, EMD, EDAX spd/spc, CEOS prz
tif, and a number of image formats.
Depending on the number of datasets to load in the file, this function will
return a HyperSpy signal instance or list of HyperSpy signal instances.
Any extra keywords are passed to the corresponding reader. For
available options, see their individual documentation.
The filename to be loaded. If None, a window will open to select
a file to load. If a valid filename is passed, that single
file is loaded. If multiple file names are passed in
a list, a list of objects or a single object containing multiple
datasets, a list of signals or a stack of signals is returned. This
behaviour is controlled by the stack parameter (see below). Multiple
files can be loaded by using simple shell-style wildcards,
e.g. ‘my_file*.msa’ loads all the files that start
by ‘my_file’ and have the ‘.msa’ extension. Alternatively, regular
expression type character classes can be used (e.g. [a-z] matches
lowercase letters). See also the escape_square_brackets parameter.
The acronym that identifies the signal type. May be any signal type
provided by HyperSpy or by installed extensions as listed by
hs.print_known_signal_types(). The value provided may determines the
Signal subclass assigned to the data.
If None (default), the value is read/guessed from the file.
Any other value would override the value potentially stored in the file.
For example, for electron energy-loss spectroscopy use ‘EELS’.
If ‘’ (empty string) the value is not read from the file and is
considered undefined.
Default False. If True and multiple filenames are passed, stacking all
the data into a single object is attempted. All files must match
in shape. If each file contains multiple (N) signals, N stacks will be
created, with the requirement that each file contains the same number
of signals.
If None (default), the signals are stacked over a new axis. The data
must have the same dimensions. Otherwise, the signals are stacked over
the axis given by its integer index or its name. The data must have the
same shape, except in the dimension corresponding to axis.
The name of the new axis (default ‘stack_element’), when axis is None.
If an axis with this name already exists, it automatically appends ‘-i’,
where i are integers, until it finds a name that is not yet in use.
If True, and filenames is a str containing square brackets,
then square brackets are escaped before wildcard matching with
glob.glob(). If False, square brackets are used to represent
character classes (e.g. [a-z] matches lowercase letters).
If integer, this value defines the index of the signal in the signal
list, from which the metadata and original_metadata are taken.
If True, the original_metadata and metadata of each signals
are stacked and saved in original_metadata.stack_elements of the
returned signal. In this case, the metadata are copied from the
first signal in the list.
If False, the metadata and original_metadata are not copied.
Specify the file reader to use when loading the file(s). If None
(default), will use the file extension to infer the file type and
appropriate reader. If str, will select the appropriate file reader
from the list of available readers in HyperSpy. If module, it must
implement the file_reader function, which returns
a dictionary containing the data and metadata for conversion to
a HyperSpy signal.
print_info: bool, optional
For SEMPER unf- and EMD (Berkeley)-files. If True, additional
information read during loading is printed for a quick overview.
Default False.
For Bruker bcf files, if set to integer (>=2) (default 1),
bcf is parsed into down-sampled size array by given integer factor,
multiple values from original bcf pixels are summed forming downsampled
pixel. This allows to improve signal and conserve the memory with the
cost of lower resolution.
For Bruker bcf files and Jeol, if set to numerical (default is None),
hypermap is parsed into array with depth cutoff at set energy value.
This allows to conserve the memory by cutting-off unused spectral
tails, or force enlargement of the spectra size.
Bruker bcf reader accepts additional values for semi-automatic cutoff.
“zealous” value truncates to the last non zero channel (this option
should not be used for stacks, as low beam current EDS can have different
last non zero channel per slice).
“auto” truncates channels to SEM/TEM acceleration voltage or
energy at last channel, depending which is smaller.
In case the hv info is not there or hv is off (0 kV) then it fallbacks to
full channel range.
If None (default), all data are loaded.
For Bruker bcf and Velox emd files: if one of ‘spectrum_image’, ‘image’
or ‘single_spectrum’, the loader returns either only the spectrum image,
only the images (including EDS map for Velox emd files), or only
the single spectra (for Velox emd files).
Only for Velox emd files: if True (default), the signals from the
different detectors are summed. If False, a distinct signal is returned
for each EDS detectors.
Only for Velox emd files: rebin the energy axis by the integer provided
during loading in order to save memory space. Needs to be a multiple of
the length of the energy dimension (default 1).
Only for Velox emd files: set the dtype of the spectrum image data in
order to save memory space. If None, the default dtype from the Velox emd
file is used.
For filetypes which support several datasets in the same file, this
will only load the specified dataset. Several datasets can be loaded
by using a list of strings. Only for EMD (NCEM) and hdf5 (USID) files.
Only for EMD NCEM. Stack datasets of groups with common name. Relevant
for emd file version >= 0.5 where groups can be named ‘group0000’,
‘group0001’, etc.
Only for HDF5 USID files: if True (default), parameters that were varied
non-linearly in the desired dataset will result in Exceptions.
Else, all such non-linearly varied parameters will be treated as
linearly varied parameters and a Signal object will be generated.
Only for FEI emi/ser files in case of series or linescan with the
acquisition stopped before the end: if True, load only the acquired
data. If False, fill empty data with zeros. Default is False and this
default value will change to True in version 2.0.
If None, the signals are stacked over a new axis. The data must
have the same dimensions. Otherwise the signals are stacked over the
axis given by its integer index or its name. The data must have the
same shape, except in the dimension corresponding to axis. If the
stacking axis of the first signal is uniform, it is extended up to the
new length; if it is non-uniform, the axes vectors of all signals are
concatenated along this direction; if it is a FunctionalDataAxis,
it is extended based on the expression of the first signal (and its sub
axis x is handled as above depending on whether it is uniform or not).
The name of the new axis when axis is None.
If an axis with this name already
exists it automatically append ‘-i’, where i are integers,
until it finds a name that is not yet in use.
If integer, this value defines the index of the signal in the signal
list, from which the metadata and original_metadata are taken.
If True, the original_metadata and metadata of each signals
are stacked and saved in original_metadata.stack_elements of the
returned signal. In this case, the metadata are copied from the
first signal in the list.
If False, the metadata and original_metadata are not copied.
Get an artificial luminescence signal in wavelength scale (nm, uniform) or
energy scale (eV, non-uniform), simulating luminescence data recorded with a
diffracting spectrometer. Some random noise is also added to the spectrum,
to simulate experimental noise.
Extra keyword arguments are passed to the
Expression component.
Notes
This is an asymmetric lineshape, originially design for xps but generally
useful for fitting peaks with low side tails
See Doniach S. and Sunjic M., J. Phys. 4C31, 285 (1970)
or http://www.casaxps.com/help_manual/line_shapes.htm for a more detailed
description
Component function in SymPy text expression format with
substitutions separated by ;. See examples and the SymPy
documentation for details. In order to vary the components along the
signal dimensions, the variables x and y must be included for 1D
or 2D components. Also, if module is “numexpr” the
functions are limited to those that numexpr support. See its
documentation for details.
The parameter name that defines the position of the component if
applicable. It enables interative adjustment of the position of the
component in the model. For 2D components, a tuple must be passed
with the name of the two parameters e.g. (“x0”, “y0”).
moduleNone or str {"numpy" | "numexpr" | "scipy"}, default “numpy”
Module used to evaluate the function. numexpr is often faster but
it supports fewer functions and requires installing numexpr.
If None, the “numexpr” will be used if installed.
If None, the rotation center is the center i.e. (0, 0) if position
is not defined, otherwise the center is the coordinates specified
by position. Alternatively a tuple with the (x, y) coordinates
of the center can be provided.
The desired name of a parameter may sometimes coincide with e.g.
the name of a scientific function, what prevents using it in the
expression. rename_parameters is a dictionary to map the name
of the parameter in the expression` to the desired name of the
parameter in the Component. For example: {“_gamma”: “gamma”}.
If True, compute the gradient automatically using sympy. If sympy
does not support the calculation of the partial derivatives, for
example in case of expression containing a “where” condition,
it can be disabled by using compute_gradients=False.
If True, automatically check if each parameter is linear and set
its corresponding attribute accordingly. If False, the default is to
set all parameters, except for those who are specified in
linear_parameter_list.
Keyword arguments can be used to initialise the value of the
parameters.
Notes
As of version 1.4, Sympy’s lambdify function, that the
Expression
components uses internally, does not support the differentiation of
some expressions, for example those containing a “where” condition.
In such cases, the gradients can be set manually if required.
Examples
The following creates a Gaussian component and set the initial value
of the parameters:
Substitutions for long or complicated expressions are separated by
semicolumns:
>>> expr='A*B/(A+B) ; A = sin(x)+one; B = cos(y) - two; y = tan(x)'>>> comp=hs.model.components1D.Expression(... expression=expr,... name='my function'... )>>> comp.parameters(<Parameter one of my function component>, <Parameter two of my function component>)
Returns a numpy array containing the value of the component for all
indices. If enough memory is available, this is useful to quickly to
obtain the fitted component without iterating over the navigation axes.
Area, equals height scaled by \(\sigma\sqrt{(2\pi)}\).
GaussianHF implements the Gaussian function with a height parameter
corresponding to the peak height.
Normalized gaussian function component, with a fwhm parameter
instead of the sigma parameter, and a height parameter instead of
the area parameter A (scaling difference of
\(\sigma \sqrt{\left(2\pi\right)}\)).
This makes the parameter vs. peak maximum independent of \(\sigma\),
and thereby makes locking of the parameter more viable. As long as there
is no binning, the height parameter corresponds directly to the peak
maximum, if not, the value is scaled by a linear constant
(signal_axis.scale).
Returns a numpy array containing the value of the component for all
indices. If enough memory is available, this is useful to quickly to
obtain the fitted component without iterating over the navigation axes.
Order of the polynomial, must be different from 0.
**kwargs
Keyword arguments can be used to initialise the value of the
parameters, i.e. a2=2, a1=3, a0=1. Extra keyword arguments are passed
to the Expression component.
Returns a numpy array containing the value of the component for all
indices. If enough memory is available, this is useful to quickly to
obtain the fitted component without iterating over the navigation axes.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
Skewness (asymmetry) parameter. For shape=0, the normal
distribution (Gaussian) is obtained. The distribution is
right skewed (longer tail to the right) if shape>0 and is
left skewed if shape<0.
**kwargs
Extra keyword arguments are passed to the
Expression component.
Notes
The properties mean (position), variance, skewness and mode
(position of maximum) are defined for convenience.
\[
pV(x,centre,\sigma) = (1 - \eta) G(x,centre,\sigma)
+ \eta L(x,centre,\sigma)
\]
\[
f(x) =
\begin{cases}
pV(x,centre,\sigma_1), & x \leq centre\\
pV(x,centre,\sigma_2), & x > centre
\end{cases}
\]
Variable
Parameter
\(A\)
A
\(\eta\)
fraction
\(\sigma_1\)
sigma1
\(\sigma_2\)
sigma2
\(centre\)
centre
Notes
This is a voigt function in which the upstream and downstream variance or
sigma is allowed to vary to create an asymmetric profile
In this case the voigt is a pseudo voigt consisting of a
mixed gaussian and lorentzian sum
Returns a numpy array containing the value of the component for all
indices. If enough memory is available, this is useful to quickly to
obtain the fitted component without iterating over the navigation axes.
Symmetric peak shape based on the convolution of a Lorentzian and Normal
(Gaussian) distribution:
\[f(x) = G(x) \cdot L(x)\]
where \(G(x)\) is the Gaussian function and \(L(x)\) is the
Lorentzian function. In this case using an approximate formula by David
(see Notes). This approximation improves on the pseudo-Voigt function
(linear combination instead of convolution of the distributions) and is,
to a very good approximation, equivalent to a Voigt function:
\[\begin{split}z(x) &= \frac{x + i \gamma}{\sqrt{2} \sigma} \\
w(z) &= \frac{e^{-z^2} \text{erfc}(-i z)}{\sqrt{2 \pi} \sigma} \\
f(x) &= A \cdot \Re\left\{ w \left[ z(x - x_0) \right] \right\}\end{split}\]
\(2 \sigma \sqrt{(2 \log(2))}\) = FWHM of the Gaussian distribution.
**kwargs
Extra keyword arguments are passed to the
Expression component.
Notes
For convenience the gwidth and lwidth attributes can also be used to
set and get the FWHM of the Gaussian and Lorentzian parts of the
distribution, respectively. For backwards compatability, FWHM is another
alias for the Gaussian width.
W.I.F. David, J. Appl. Cryst. (1986). 19, 63-64,
doi:10.1107/S0021889886089999
Set the facecolor(s) of the markers. It can be a color
(all patches have same color), or a sequence of colors;
if it is a sequence the patches will cycle through the sequence.
If c is ‘none’, the patch will not be filled.
The units in which majors and minors are given; "width" and
"height" refer to the dimensions of the axes, while "x" and "y"
refer to the offsets data units. "xy" differs from all others in
that the angle as plotted varies with the aspect ratio, and equals
the specified angle only when the aspect ratio is unity. Hence
it behaves the same as the matplotlib.patches.Ellipse with
axes.transData as its transform.
The units in which majors and minors are given; "width" and
"height" refer to the dimensions of the axes, while "x" and "y"
refer to the offsets data units. "xy" differs from all others in
that the angle as plotted varies with the aspect ratio, and equals
the specified angle only when the aspect ratio is unity. Hence
it behaves the same as the matplotlib.patches.Ellipse with
axes.transData as its transform.
Keyword arguments passed to the underlying marker collection. Any argument
that is array-like and has dtype=object is assumed to be an iterating
argument and is treated as such.
Must be with shape [n, 2, 2] ragged array with shape (n, 2, 3) at every navigation position.
Defines the lines[[[x1,y1],[x2,y2]], …] of the center of the ellipse.
Unlike markers using offsets argument, the positions of the segments
are defined by the segments argument and the tranform specifying the
coordinate system of the segments is transform.
The markers are defined by a set of arugment required by the collections,
typically, offsets, verts or segments will define their
positions.
To define a non-static marker any argument that can be set with the
matplotlib.collections.Collection.set() method can be passed
as an array with dtype=object of the constructor and the same size as
the navigation axes of the a signal the markers will be added to.
offset_transform define the transformation used for the
offsets` value and tranform define the transformation for
other arguments, typically to scale the size of the Path.
It can be one of the following:
"data": the offsets are defined in data coordinates and the ax.transData transformation is used.
"relative": The offsets are defined in data coordinates in x and coordinates in y relative to the
data plotted. Only for 1D figure.
"axes": the offsets are defined in axes coordinates and the ax.transAxes transformation is used.
(0, 0) is bottom left of the axes, and (1, 1) is top right of the axes.
"display": the offsets are not transformed, i.e. are defined in the display coordinate system.
(0, 0) is the bottom left of the window, and (width, height) is top right of the output in “display units”
matplotlib.transforms.IdentityTransform.
Only for offset_transform="relative". This applied a systematic
shift in the y component of the offsets values. The shift is
defined in the matplotlib "axes" coordinate system.
This provides a constant shift from the data for labeling
Signal1D.
Set the array of the matplotlib.cm.ScalarMappable of the
matplotlib collection.
The ScalarMappable array will overwrite facecolor and
edgecolor. Default is None.
Keyword arguments passed to the underlying marker collection. Any argument
that is array-like and has dtype=object is assumed to be an iterating
argument and is treated as such.
If True, the figure is rendered after removing the marker.
If False, the figure is not rendered after removing the marker.
This is useful when many markers are removed from a figure,
since rendering the figure after removing each marker will slow
things down.
Initialize a marker collection from a hyperspy Signal.
Parameters:
signal: :class:`~.api.signals.BaseSignal`
A value passed to the Collection as {key:signal.data}
key: str or None
The key used to create a key value pair to create the subclass of
matplotlib.collections.Collection. If None (default)
the key is set to "offsets".
signal_axes: str, tuple of :class:`~.axes.UniformDataAxis` or None
If "metadata" look for signal_axes saved in metadata under
s.metadata.Peaks.signal_axes and convert from pixel positions
to real units before creating the collection. If a tuple of
signal axes, those axes will be used otherwise (None)
no transformation will happen.
If True, will render the figure after adding the marker.
If False, the marker will be added to the plot, but will the figure
will not be rendered. This is useful when plotting many markers,
since rendering the figure after adding each marker will slow
things down.
>>> rng=np.random.default_rng(0)>>> s=hs.signals.Signal2D(np.ones((100,100)))>>> # Define the size of the circles>>> sizes=rng.random((10,))*10+20>>> # Define the position of the circles>>> offsets=rng.random((10,2))*100>>> m=hs.plot.markers.Circles(... sizes=sizes,... offsets=offsets,... linewidth=2,... )>>> s.plot()>>> s.add_marker(m)>>> m.set_ScalarMappable_array(sizes.ravel()/2)>>> cbar=m.plot_colorbar()>>> cbar.set_label('Circle radius')
The units in which majors and minors are given; "width" and
"height" refer to the dimensions of the axes, while "x" and "y"
refer to the offsets data units. "xy" differs from all others in
that the angle as plotted varies with the aspect ratio, and equals
the specified angle only when the aspect ratio is unity. Hence
it behaves the same as the matplotlib.patches.Ellipse with
axes.transData as its transform.
The verts define the vertices of the polygons. Note that this can be
a ragged list and as such it is not automatically cast to a numpy
array as that would result in an array of objects.
In the form [[[x1,y1], [x2,y2], … [xn, yn]],[[x1,y1], [x2,y2], …[xm,ym]], …].
Unlike markers using offsets argument, the positions of the polygon
are defined by the verts argument and the tranform specifying the
coordinate system of the verts is transform.
Examples
>>> importhyperspy.apiashs>>> importnumpyasnp>>> # Create a Signal2D with 2 navigation dimensions>>> data=np.ones((25,25,100,100))>>> s=hs.signals.Signal2D(data)>>> poylgon1=[[1,1],[20,20],[1,20],[25,5]]>>> poylgon2=[[50,60],[90,40],[60,40],[23,60]]>>> verts=[poylgon1,poylgon2]>>> # Create the markers>>> m=hs.plot.markers.Polygons(... verts=verts,... linewidth=3,... facecolors=('g',),... )>>> # Add the marker to the signal>>> s.plot()>>> s.add_marker(m)
classhyperspy.api.plot.markers.Rectangles(offsets, widths, heights, angles=0, offset_transform='data', units='xy', **kwargs)#
The units in which majors and minors are given; "width" and
"height" refer to the dimensions of the axes, while "x" and "y"
refer to the offsets data units. "xy" differs from all others in
that the angle as plotted varies with the aspect ratio, and equals
the specified angle only when the aspect ratio is unity. Hence
it behaves the same as the matplotlib.patches.Ellipse with
axes.transData as its transform.
kwargs:
Additional keyword arguments are passed to
hyperspy.external.matplotlib.collections.RectangleCollection.
The units in which majors and minors are given; "width" and
"height" refer to the dimensions of the axes, while "x" and "y"
refer to the offsets data units. "xy" differs from all others in
that the angle as plotted varies with the aspect ratio, and equals
the specified angle only when the aspect ratio is unity. Hence
it behaves the same as the matplotlib.patches.Ellipse with
axes.transData as its transform.
kwargs:
Additional keyword arguments are passed to
hyperspy.external.matplotlib.collections.SquareCollection.
Set the facecolor(s) of the markers. It can be a color
(all patches have same color), or a sequence of colors;
if it is a sequence the patches will cycle through the sequence.
If c is ‘none’, the patch will not be filled.
Keyword arguments passed to the underlying marker collection. Any argument
that is array-like and has dtype=object is assumed to be an iterating
argument and is treated as such.
Examples
>>> importhyperspy.apiashs>>> importnumpyasnp>>> # Create a Signal2D with 2 navigation dimensions>>> rng=np.random.default_rng(0)>>> data=rng.random((25,25,100))>>> s=hs.signals.Signal1D(data)>>> offsets=np.array([10,20,40])>>> # Create the markers>>> m=hs.plot.markers.VerticalLines(... offsets=offsets,... linewidth=3,... colors=['r','g','b'],... )>>> # Add the marker to the signal>>> s.plot()>>> s.add_marker(m)
Sets the color of the lines of the plots. For a list, if its length is
less than the number of spectra to plot, the colors will be cycled.
If None, use default matplotlib color cycle.
linestyleNone, (list of) matplotlib line style, optional
The main line styles are '-', '--', '-.', ':'.
For a list, if its length is less than the number of
spectra to plot, linestyle will be cycled.
If None, use continuous lines (same as '-').
images should be a list of Signals to plot. For
BaseSignal with navigation dimensions 2 and
signal dimension 0, the signal will be transposed to form a Signal2D.
Multi-dimensional images will have each plane plotted as a separate
image. If any of the signal shapes is not suitable, a ValueError will be
raised.
The colormap used for the images, by default uses the setting
colormapsignal from the plot preferences. A list of colormaps can
also be provided, and the images will cycle through them. Optionally,
the value 'mpl_colors' will cause the cmap to loop through the
default matplotlib colors (to match with the default output of the
plot_spectra() method).
Note: if using more than one colormap, using the 'single'
option for colorbar is disallowed.
Control the title labeling of the plotted images.
If None, no titles will be shown.
If ‘auto’ (default), function will try to determine suitable titles
using Signal2D titles, falling back to the ‘titles’ option if no good
short titles are detected.
Works best if all images to be plotted have the same beginning
to their titles.
If ‘titles’, the title from each image’s metadata.General.title
will be used.
If any other single str, images will be labeled in sequence using
that str as a prefix.
If a list of str, the list elements will be used to determine the
labels (repeated, if necessary).
Integer specifying the number of characters that will be used on
one line.
If the function returns an unexpected blank figure, lower this
value to reduce overlap of the labels between figures.
Controls the type of colorbars that are plotted, incompatible with
overlay=True.
If ‘default’, same as ‘multi’ when overlay=False, otherwise same
as None.
If ‘multi’, individual colorbars are plotted for each (non-RGB) image.
If ‘single’, all (non-RGB) images are plotted on the same scale,
and one colorbar is shown for all.
If None, no colorbar is plotted.
If True, the centre of the color scheme is set to zero. This is
particularly useful when using diverging color schemes. If ‘auto’
(default), diverging color schemes are automatically centred.
If None (or False), no scalebars will be added to the images.
If ‘all’, scalebars will be added to all images.
If list of ints, scalebars will be added to each image specified.
Controls how the axes are displayed on each image; default is ‘all’.
If ‘all’, both ticks and axis labels will be shown.
If ‘ticks’, no axis labels will be shown, but ticks/labels will.
If ‘off’, all decorations and frame will be disabled.
If None, no axis decorations will be shown, but ticks/frame will.
This parameter controls the spacing between images.
If None, default options will be used.
Otherwise, supply a dictionary with the spacing options as
keywords and desired values as values.
Values should be supplied as used in
matplotlib.pyplot.subplots_adjust(),
and can be ‘left’, ‘bottom’, ‘right’, ‘top’, ‘wspace’ (width) and
‘hspace’ (height).
If true, hyperspy will attempt to improve image placement in
figure using matplotlib’s tight_layout.
If false, repositioning images inside the figure will be left as
an exercise for the user.
If ‘auto’, aspect ratio is auto determined, subject to min_asp.
If ‘square’, image will be forced onto square display.
If ‘equal’, aspect ratio of 1 will be enforced.
If float (or int/long), given value will be used.
Threshold to use for auto-labeling. This parameter controls how
much of the titles must be the same for the auto-shortening of
labels to activate. Can vary from 0 to 1. Smaller values
encourage shortening of titles by auto-labeling, while larger
values will require more overlap in titles before activing the
auto-label code.
If set, the images will be plotted to an existing matplotlib figure.
vmin, vmax: scalar, str, None
If str, formatted as ‘xth’, use this value to calculate the percentage
of pixels that are left out of the lower and upper bounds.
For example, for a vmin of ‘1th’, 1% of the lowest will be ignored to
estimate the minimum value. Similarly, for a vmax value of ‘1th’, 1%
of the highest value will be ignored in the estimation of the maximum
value. It must be in the range [0, 100].
See numpy.percentile() for more explanation.
If None, use the percentiles value set in the preferences.
If float or integer, keep this value as bounds.
Note: vmin is ignored when overlaying images.
If None (default), the size of the figure is taken from the
matplotlib rcParams. Otherwise sets the size of the figure when
plotting an overlay image. The higher the number the larger the figure
and therefore a greater number of pixels are used. This value will be
ignored if a Figure is provided.
interpolation is a useful parameter to provide as a keyword
argument to control how the space between pixels is interpolated. A
value of 'nearest' will cause no interpolation between pixels.
tight_layout is known to be quite brittle, so an option is provided
to disable it. Turn this option off if output is not as expected,
or try adjusting label, labelwrap, or per_row.
Color of the ROIs. Any string supported by matplotlib to define a color
can be used. The length of the list must be equal to the number ROIs.
If None (default), the default matplotlib colors are used.
Only for signals with navigation dimension of 2. Define the colormap of the map(s).
If string, any string supported by matplotlib to define a colormap can be used
and a colored frame matching the ROI color will be added to the map.
If list of str, it must be the same length as the ROIs.
If None (default), the colors from the color argument are used and no colored
frame is added.
Whether to plot on a single figure or several figures.
If True, plot_images() or plot_spectra()
will be used, depending on the navigation dimension of the signal.
Only when single_figure=True. Keywords arguments are passed to
plot_images() or plot_spectra()
depending on the navigation dimension of the signal.
If None, default kwargs are used with the following changes
scalebar=[0], axes_decor="off" and suptitle="".
If the data sliced by the ROI contains numpy.nan, numpy.nansum()
will be used instead of numpy.sum() at the cost of a speed penalty (more than
2 times slower).
Plotting ROI maps on a single figure is slower than on separate figures.
Examples
3D hyperspectral data
For 3D hyperspectral data, the ROIs used will be instances of
SpanROI. Therefore, these ROIs can be used to select
particular spectral ranges, e.g. a particular peak.
The map generated for a given ROI is therefore the sum of this spectral
region at each point in the hyperspectral map. Therefore, regions of the
sample where this peak is bright will be bright in this map.
4D STEM
For 4D STEM data, by default, the ROIs used will be instances of
RectangularROI. Other hyperspy ROIs, such as
CircleROI can be used. These ROIs can be used
to select particular regions in reciprocal space, e.g. a particular
diffraction spot.
The map generated for a given ROI is the intensity of this
region at each point in the scan. Therefore, regions of the
scan where a particular spot is intense will appear bright.
If True (default), the signals will share navigation. All the signals
must have the same navigation shape for this to work, but not
necessarily the same signal shape.
Set different navigator options for the signals. Must use valid
navigator arguments: ‘auto’, None, ‘spectrum’, ‘slider’, or a
HyperSpy Signal. The list must have the same size as signal_list.
If None, the argument specified in navigator will be used.
Ordered spectra list of signal to plot. If style is “cascade” or
“mosaic”, the spectra can have different size and axes.
For BaseSignal with navigation dimensions 1
and signal dimension 0, the signal will be transposed to form a
Signal1D.
The style of the plot: ‘overlap’ (default), ‘cascade’, ‘mosaic’, or
‘heatmap’.
colorNone or (list of) matplotlib color, default None
Sets the color of the lines of the plots (no action on ‘heatmap’).
For a list, if its length is less than the number of spectra to plot,
the colors will be cycled. If None (default), use default matplotlib
color cycle.
linestyleNone or (list of) matplotlib line style, default None
Sets the line style of the plots (no action on ‘heatmap’).
The main line style are '-', '--', '-.', ':'.
For a list, if its length is less than the number of
spectra to plot, linestyle will be cycled.
If None, use continuous lines (same as '-').
default ‘default’
The drawstyle determines how the points are connected, no action with
style='heatmap'. See
matplotlib.lines.Line2D.set_drawstyle() for more information.
The 'default' value is defined by matplotlib.
Option for “cascade”. 1 guarantees that there is no overlapping.
However, in many cases, a value between 0 and 1 can produce a tighter
plot without overlapping. Negative values have the same effect but
reverse the order of the spectra without reversing the order of the
colors.
If list of string, legend for ‘cascade’ or title for ‘mosaic’ is
displayed. If ‘auto’, the title of each spectra (metadata.General.title)
is used. Default None.
If True, the plot will update when the data are changed. Only supported
with style=’overlap’ and a list of signal with navigation dimension 0.
If None (default), update the plot only for style=’overlap’.
Selects a circular or annular region in a 2D space. The coordinates of
the center of the circle are stored in the ‘cx’ and ‘cy’ attributes. The
radious in the r attribute. If an internal radius is defined using the
r_inner attribute, then an annular region is selected instead.
CircleROI can be used in place of a tuple containing (cx, cy, r), (cx,
cy, r, r_inner) when r_inner is not None.
Sets up events.changed event, and inits HasTraits.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
Selects a line of a given width in 2D space. The coordinates of the end points of the line are stored in the x1, y1, x2, y2 parameters.
The length is available in the length parameter and the method angle computes the angle of the line with the axes.
Line2DROI can be used in place of a tuple containing the coordinates of the two end-points of the line and the linewdith (x1, y1, x2, y2, linewidth).
Sets up events.changed event, and inits HasTraits.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
Selects a range in a 2D space. The coordinates of the range in
the 2D space are stored in the traits 'left', 'right', 'top' and 'bottom'.
Convenience properties 'x', 'y', 'width' and 'height' are also available,
but cannot be used for initialization.
RectangularROI can be used in place of a tuple containing (left, right, top, bottom).
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
A SAMFire strategy that operates in "parameter space" - i.e the pixel positions are not important, and only parameter value distributions are segmented to be used as starting point estimators.
A SAMFire strategy that operates in “parameter space” - i.e the pixel
positions are not important, and only parameter value distributions are
segmented to be used as starting point estimators.
A SAMFire strategy that operates in “pixel space” - i.e calculates the
starting point estimates based on the local averages of the pixels.
Requires some weighting method (e.g. reduced chi-squared).
If True, all but the given_pixels will be recalculated. Used when
part of already calculated results has to be refreshed.
If False, only use pixels with marker == -scale (by default -1) to
propagate to pixels with marker >= 0. This allows “ignoring” pixels
with marker < -scale (e.g. -2).
Returns the current starting value estimates for the given pixel.
Calculated as the weighted local average. Only returns components that
are active, and parameters that are free.
If True, all but the given_pixels will be recalculated. Used when
part of already calculated results has to be refreshed.
If False, only use pixels with marker == -scale (by default -1) to
propagate to pixels with marker >= 0. This allows “ignoring” pixels
with marker < -scale (e.g. -2).
Returns the current starting value estimates for the given pixel.
Calculated as the weighted local average. Only returns components that
are active, and parameters that are free.
Creates and manages a pool of SAMFire workers. For based on
ParallelPool - either creates processes using multiprocessing, or connects
and sets up ipyparallel load_balanced_view.
Ipyparallel is managed directly, but multiprocessing pool is managed via
three of queues:
Shared by all (master and workers) for distributing “load-balanced”
work.
Shared by all (master and workers) for sending results back to the
master
Individual queues from master to each worker. For setting up and
addressing individual workers in general. This one is checked with
higher priority in workers.
The timestep between “ticks” that the result queues are checked. Higher
timestep means less frequent checking, which may reduce CPU load for
difficult fits that take a long time to finish.
Keyword currently can be one of [‘pong’, ‘Error’, ‘result’]. For
each of the keywords, “the_rest” is a tuple of different elements,
but generally the first one is always the worker_id that the result
came from. In particular:
Run the full process of adding jobs to the processing queue,
listening to the results and updating SAMFire as needed. Stops when
timed out or no pixels are left to run.
Run the full procedure until no more pixels are left to run in the
SAMFire.
For generic data with arbitrary signal_dimension. All other signal
classes inherit from this one. It should only be used with none of
the others is appropriated.
Signal1D
For generic data with signal_dimension equal 1, i.e. spectral data of
n-dimensions. The signal is unbinned by default.
Signal2D
For generic data with signal_dimension equal 2, i.e. image data of
n-dimensions. The signal is unbinned by default.
ComplexSignal
For generic complex data with arbitrary signal_dimension.
ComplexSignal1D
For generic complex data with signal_dimension equal 1, i.e. spectral
data of n-dimensions. The signal is unbinned by default.
ComplexSignal2D
For generic complex data with signal_dimension equal 2, i.e. image
data of n-dimensions. The signal is unbinned by default.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
The transpose of the signal, with signal and navigation spaces
swapped. Enables calling
transpose() with the default
parameters as a property of a Signal.
The operation is performed in-place (i.e. the data of the signal
is modified). This method requires the signal to have a float data type,
otherwise it will raise a TypeError.
The marker or iterable (list, tuple, …) of markers to add.
See the Markers section in the User Guide if you want
to add a large number of markers as an iterable, since this will
be much faster. For signals with navigation dimensions,
the markers can be made to change for different navigation
indices. See the examples for info.
If False, the marker will only appear in the current
plot. If True, the marker will be added to the
metadata.Markers list, and be plotted with
plot(plot_markers=True). If the signal is saved as a HyperSpy
HDF5 file, the markers will be stored in the HDF5 signal and be
restored when the file is loaded.
If True, keep the original data type of the signal data. For
example, if the data type was initially 'float64', the result of
the operation (usually 'int64') will be converted to
'float64'.
Only used if window='hann'
If integer n is provided, a Hann window of n-th order will be
used. If None, a first order Hann window is used.
Higher orders result in more homogeneous intensity distribution.
Shape parameter of the Tukey window, representing the
fraction of the window inside the cosine tapered region. If
zero, the Tukey window is equivalent to a rectangular window.
If one, the Tukey window is equivalent to a Hann window.
The chosen spectral axis is moved to the last index in the
array and the data is made contiguous for efficient iteration over
spectra. By default, the method ensures the data is stored optimally,
hence often making a copy of the data. See
transpose() for a more general
method with more options.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
``”CuBICA”`` | ``”TDSEP”``} or object, default “sklearn_fastica”
The BSS algorithm to use. If algorithm is an object,
it must implement a fit_transform() method or fit() and
transform() methods, in the same manner as a scikit-learn estimator.
If None and on_loadings is False, when diff_order is greater than 1
and signal_dimension is greater than 1, the differences are calculated
across all signal axes
If None and on_loadings is True, when diff_order is greater than 1
and navigation_dimension is greater than 1, the differences are calculated
across all navigation axes
Factors to decompose. If None, the BSS is performed on the
factors of a previous decomposition. If a Signal instance, the
navigation dimension must be 1 and the size greater than 1.
If not None, the signal locations marked as True are masked. The
mask shape must be equal to the signal shape
(navigation shape) when on_loadings is False (True).
Use either the factors or the loadings to determine if the
component needs to be reversed.
whiten_method{"PCA" | "ZCA"} or None, default “PCA”
How to whiten the data prior to blind source separation.
If None, no whitening is applied. See whiten_data()
for more details.
return_info: bool, default False
The result of the decomposition is stored internally. However,
some algorithms generate some extra information that is not
stored. If True, return any extra information if available.
In the case of sklearn.decomposition objects, this includes the
sklearn Estimator object.
If True, print information about the decomposition being performed.
In the case of sklearn.decomposition objects, this includes the
values of all arguments of the chosen sklearn algorithm.
Typecode string or data-type to which the Signal’s data array is
cast. In addition to all the standard numpy Data type objects (dtype),
HyperSpy supports four extra dtypes for RGB images: 'rgb8',
'rgba8', 'rgb16', and 'rgba16'. Changing from and to
any rgb(a) dtype is more constrained than most other dtype
conversions. To change to an rgb(a) dtype,
the signal_dimension must be 1, and its size should be 3 (for
rgb) or 4 (for rgba) dtypes. The original dtype
should be uint8 or uint16 if converting to rgb(a)8
or rgb(a))16, and the navigation_dimension should be at
least 2. After conversion, the signal_dimension becomes 2. The
dtype of images with original dtype rgb(a)8 or rgb(a)16
can only be changed to uint8 or uint16, and the
signal_dimension becomes 1.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Cluster analysis of a signal or decomposition results of a signal
Results are stored in learning_results.
Parameters:
cluster_sourcestr {"bss" | "decomposition" | "signal"} or BaseSignal
If “bss” the blind source separation results are used
If “decomposition” the decomposition results are used
if “signal” the signal data is used
Note that using the signal or BaseSignal can be memory intensive
and is only recommended if the Signal dimension is small
BaseSignal must have the same navigation dimensions as the signal.
source_for_centersNone, str {"decomposition" | "bss" | "signal"} or BaseSignal
default : None
If None the cluster_source is used
If “bss” the blind source separation results are used
If “decomposition” the decomposition results are used
if “signal” the signal data is used
BaseSignal must have the same navigation dimensions as the signal.
preprocessingstr {"standard" | "norm" | "minmax"}, None or object
default: ‘norm’
Preprocessing the data before cluster analysis requires preprocessing
the data to be clustered to similar scales. Standard preprocessing
adjusts each feature to have uniform variation. Norm preprocessing
adjusts treats the set of features like a vector and
each measurement is scaled to length 1.
You can also pass one of the scikit-learn preprocessing
scale_method = import sklearn.processing.StandadScaler()
preprocessing = scale_method
See preprocessing methods in scikit-learn preprocessing for further
details. If object, must be sklearn.preprocessing-like.
If you are getting the cluster centers using the decomposition
results (cluster_source_for_centers=”decomposition”) you can define how
many components to use. If set to None the method uses the
estimate of significant components found in the decomposition step
using the elbow method and stored in the
learning_results.number_significant_components attribute.
This applies to both bss and decomposition results.
The signal locations marked as True are not used in the
clustering for “signal” or Signals supplied as cluster source.
This is not applied to decomposition results or source_for_centers
(as it may be a different shape to the cluster source)
The result of the cluster analysis is stored internally. However,
the cluster class used contain a number of attributes.
If True (the default is False)
return the cluster object so the attributes can be accessed.
Additional parameters passed to the clustering class for initialization.
For example, in case of the “kmeans” algorithm, n_init can be
used to define the number of times the algorithm is restarted to
optimize results.
If 'return_info' is True returns the Scikit-learn cluster object
used for clustering. Useful if you wish to examine inertia or other outputs.
Other Parameters:
int
Number of clusters to find using the one of the pre-defined methods
“kmeans”, “agglomerative”, “minibatchkmeans”, “spectralclustering”
See sklearn.cluster for details
Return a “shallow copy” of this Signal using the
standard library’s copy() function. Note: this will
return a copy of the signal, but it will not duplicate the underlying
data in memory, and both Signals will reference the same data.
Specify the data axis in which to perform the cropping
operation. The axis can be specified using the index of the
axis in axes_manager or the axis name.
The beginning of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
The end of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
``”mini_batch_sparse_pca”``, ``”RPCA”``, ``”ORPCA”``, ``”ORNMF”``} or object, default ``”SVD”``
The decomposition algorithm to use. If algorithm is an object,
it must implement a fit_transform() method or fit() and
transform() methods, in the same manner as a scikit-learn estimator.
For cupy arrays, only “SVD” is supported.
If None, ignored
If callable, applies the function to the data to obtain var_array.
Only used by the “MLPCA” algorithm.
If numpy array, creates var_array by applying a polynomial function
defined by the array of coefficients to the data. Only used by
the “MLPCA” algorithm.
reprojectNone or str {“signal”, “navigation”, “both”}, default None
If not None, the results of the decomposition will be projected in
the selected masked area.
return_info: bool, default False
The result of the decomposition is stored internally. However,
some algorithms generate some extra information that is not
stored. If True, return any extra information if available.
In the case of sklearn.decomposition objects, this includes the
sklearn Estimator object.
If True, print information about the decomposition being performed.
In the case of sklearn.decomposition objects, this includes the
values of all arguments of the chosen sklearn algorithm.
If "auto": the solver is selected by a default policy based on data.shape and
output_dimension: if the input data is larger than 500x500 and the
number of components to extract is lower than 80% of the smallest
dimension of the data, then the more efficient "randomized"
method is enabled. Otherwise the exact full SVD is computed and
optionally truncated afterwards.
If "full": run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd(), and select the components by postprocessing
If "arpack": use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds(). It strictly requires
0<output_dimension<min(data.shape)
If True, stores a copy of the data before any pre-treatments
such as normalization in s._data_before_treatments. The original
data can then be restored by calling s.undo_treatments().
If False, no copy is made. This can be beneficial for memory
usage, but care must be taken since data will be overwritten.
Return a “deep copy” of this Signal using the
standard library’s deepcopy() function. Note: this means
the underlying data structure will be duplicated in memory.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Note that the size of the data on the given axis decreases by
the given order. i.e. if axis is "x" and order is
2, if the x dimension is N, then der’s x dimension is N - 2.
Returns a signal with the n-th order discrete difference along
given axis. i.e. it calculates the difference between consecutive
values in the given axis: out[n]=a[n+1]-a[n]. See
numpy.diff() for more details.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Note that the size of the data on the given axis decreases by
the given order. i.e. if axis is "x" and order is
2, the x dimension is N, der’s x dimension is N - 2.
If you intend to calculate the numerical derivative, please use the
proper derivative() function
instead. To avoid erroneous misuse of the diff function as derivative,
it raises an error when when working with a non-uniform axis.
Estimate the elbow position of a scree plot curve.
Used to estimate the number of significant components in
a PCA variance ratio plot or other “elbow” type curves.
Find a line between first and last point on the scree plot.
With a classic elbow scree plot, this line more or less
defines a triangle. The elbow should be the point which
is the furthest distance from this line. For more details,
see [1].
Explained variance ratio values that form the scree plot.
If None, uses the explained_variance_ratio array stored
in s.learning_results, so a decomposition must have
been performed first.
V. Satopää, J. Albrecht, D. Irwin, and B. Raghavan.
“Finding a “Kneedle” in a Haystack: Detecting Knee Points in
System Behavior,. 31st International Conference on Distributed
Computing Systems Workshops, pp. 166-171, June 2011.
Performs cluster analysis of a signal for cluster sizes ranging from
n_clusters =2 to max_clusters ( default 12)
Note that this can be a slow process for large datasets so please
consider reducing max_clusters in this case.
For each cluster it evaluates the silhouette score which is a metric of
how well separated the clusters are. Maximima or peaks in the scores
indicate good choices for cluster sizes.
Parameters:
cluster_sourcestr {“bss”, “decomposition”, “signal”} or BaseSignal
If “bss” the blind source separation results are used
If “decomposition” the decomposition results are used
if “signal” the signal data is used
Note that using the signal can be memory intensive
and is only recommended if the Signal dimension is small.
Input Signal must have the same navigation dimensions as the
signal instance.
Max number of clusters to use. The method will scan from 2 to
max_clusters.
preprocessingstr {“standard”, “norm”, “minmax”} or object
default: ‘norm’
Preprocessing the data before cluster analysis requires preprocessing
the data to be clustered to similar scales. Standard preprocessing
adjusts each feature to have uniform variation. Norm preprocessing
adjusts treats the set of features like a vector and
each measurement is scaled to length 1.
You can also pass an instance of a sklearn preprocessing module.
See preprocessing methods in scikit-learn preprocessing for further
details. If object, must be sklearn.preprocessing-like.
If you are getting the cluster centers using the decomposition
results (cluster_source_for_centers=”decomposition”) you can define how
many PCA components to use. If set to None the method uses the
estimate of significant components found in the decomposition step
using the elbow method and stored in the
learning_results.number_significant_components attribute.
Use distance,silhouette analysis or gap statistics to estimate
the optimal number of clusters.
Gap is believed to be, overall, the best metric but it’s also
the slowest. Elbow measures the distances between points in
each cluster as an estimate of how well grouped they are and
is the fastest metric.
For elbow the optimal k is the knee or elbow point.
For gap the optimal k is the first k gap(k)>= gap(k+1)-std_error
For silhouette the optimal k will be one of the “maxima” found with
this method
Number of references to use in gap statistics method
Gap statistics compares the results from clustering the data to
clustering uniformly distributed data. As clustering has
a random variation it is typically averaged n_ref times
to get an statistical average.
Number of clusters to find using the one of the pre-defined methods
“kmeans”,”agglomerative”,”minibatchkmeans”,”spectralclustering”
See sklearn.cluster for details
Estimate the Poissonian noise variance of the signal.
The variance is stored in the
metadata.Signal.Noise_properties.variance attribute.
The Poissonian noise variance is equal to the expected value. With the
default arguments, this method simply sets the variance attribute to
the given expected_value. However, more generally (although then the
noise is not strictly Poissonian), the variance may be proportional to
the expected value. Moreover, when the noise is a mixture of white
(Gaussian) and Poissonian noise, the variance is described by the
following linear model:
\[\mathrm{Var}[X] = (a * \mathrm{E}[X] + b) * c\]
Where a is the gain_factor, b is the gain_offset (the Gaussian
noise variance) and c the correlation_factor. The correlation
factor accounts for correlation of adjacent signal elements that can
be modeled as a convolution with a Gaussian point spread function.
a in the above equation. Must be positive. If None, take the
value from metadata.Signal.Noise_properties.Variance_linear_model
if defined. Otherwise, suppose pure Poissonian noise (i.e.gain_factor=1). If not None, the value is stored in
metadata.Signal.Noise_properties.Variance_linear_model.
b in the above equation. Must be positive. If None, take the
value from metadata.Signal.Noise_properties.Variance_linear_model
if defined. Otherwise, suppose pure Poissonian noise (i.e.gain_offset=0). If not None, the value is stored in
metadata.Signal.Noise_properties.Variance_linear_model.
c in the above equation. Must be positive. If None, take the
value from metadata.Signal.Noise_properties.Variance_linear_model
if defined. Otherwise, suppose pure Poissonian noise (i.e.correlation_factor=1). If not None, the value is stored in
metadata.Signal.Noise_properties.Variance_linear_model.
If None, returns all components/loadings.
If an int, returns components/loadings with ids from 0 to the
given value.
If a list of ints, returns components/loadings with ids provided in
the given list.
The extension of the format that you wish to save to. default
is 'hspy'. The format determines the kind of output:
For image formats ('tif', 'png', 'jpg', etc.),
plots are created using the plotting flags as below, and saved
at 600 dpi. One plot is saved per loading.
For multidimensional formats ('rpl', 'hspy'), arrays
are saved in single files. All loadings are contained in the
one file.
For spectral formats ('msa'), each loading is saved to a
separate file.
If True, one file will be created for each factor and loading.
Otherwise, only two files will be created, one for
the factors and another for the loadings. The default value can
be chosen in the preferences.
if None, returns all clusters/centers.
if int, returns clusters/centers with ids from 0 to
given int.
if list of ints, returnsclusters/centers with ids in
given list.
If True, on exporting a file per center will
be created. Otherwise only two files will be created, one for
the centers and another for the membership. The default value can
be chosen in the preferences.
If None, returns all components/loadings.
If an int, returns components/loadings with ids from 0 to the
given value.
If a list of ints, returns components/loadings with ids provided in
the given list.
The extension of the format that you wish to save to. default
is 'hspy'. The format determines the kind of output:
For image formats ('tif', 'png', 'jpg', etc.),
plots are created using the plotting flags as below, and saved
at 600 dpi. One plot is saved per loading.
For multidimensional formats ('rpl', 'hspy'), arrays
are saved in single files. All loadings are contained in the
one file.
For spectral formats ('msa'), each loading is saved to a
separate file.
If True, one file will be created for each factor and loading.
Otherwise, only two files will be created, one for
the factors and another for the loadings. The default value can
be chosen in the preferences.
Apply an
apodization window
before calculating the FFT in order to suppress streaks.
Valid string values are {'hann' or 'hamming' or 'tukey'}
If True or 'hann', applies a Hann window.
If 'hamming' or 'tukey', applies Hamming or Tukey
windows, respectively (default is False).
If None, rebuilds signal instance from all components
If int, rebuilds signal instance from components in range 0-given int
If list of ints, rebuilds signal instance from only components in given list
If False the cluster label signal has a navigation axes of length
number_of_clusters and the signal along the the navigation
direction is binary - 0 the point is not in the cluster, 1 it is
included. If True, the cluster labels are merged (no navigation
axes). The value of the signal at any point will be between -1 and
the number of clusters. -1 represents the points that
were masked for cluster analysis if any.
If True and tmp_parameters.filename is defined
(which is always the case when the Signal has been read from a
file), the filename stored in the metadata is modified by
appending an underscore and the current indices in parentheses.
Get the dimension parameters from the Signal’s underlying data.
Useful when the data structure was externally modified, or when the
spectrum image was not loaded from a file
More sophisticated algorithms for determining the bins can be used
by passing a string as the bins argument. Other than the 'blocks'
and 'knuth' methods, the available algorithms are the same as
numpy.histogram().
Note: The lazy version of the algorithm only supports "scott"
and "fd" as a string argument for bins.
If bins is an int, it defines the number of equal-width
bins in the given range. If bins is a
sequence, it defines the bin edges, including the rightmost
edge, allowing for non-uniform bin widths.
If bins is a string from the list below, will use
the method chosen to calculate the optimal bin width and
consequently the number of bins (see Notes for more detail on
the estimators) from the data that falls within the requested
range. While the bin width will be optimal for the actual data
in the range, the number of bins will be computed to fill the
entire range, including the empty portions. For visualisation,
using the 'auto' option is suggested. Weighted data is not
supported for automated bin size selection.
‘auto’
Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good
all around performance.
‘fd’ (Freedman Diaconis Estimator)
Robust (resilient to outliers) estimator that takes into
account data variability and data size.
‘doane’
An improved version of Sturges’ estimator that works better
with non-normal datasets.
‘scott’
Less robust estimator that that takes into account data
variability and data size.
‘stone’
Estimator based on leave-one-out cross-validation estimate of
the integrated squared error. Can be regarded as a generalization
of Scott’s rule.
‘rice’
Estimator does not take variability into account, only data
size. Commonly overestimates number of bins required.
‘sturges’
R’s default method, only accounts for data size. Only
optimal for gaussian data and underestimates number of bins
for large non-gaussian datasets.
‘sqrt’
Square root (of data size) estimator, used by Excel and
other programs for its speed and simplicity.
‘knuth’
Knuth’s rule is a fixed-width, Bayesian approach to determining
the optimal bin width of a histogram.
‘blocks’
Determination of optimal adaptive-width histogram bins using
the Bayesian Blocks algorithm.
When estimating the bins using one of the str methods, the
number of bins is capped by this number to avoid a MemoryError
being raised by numpy.histogram().
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
**kwargs
other keyword arguments (weight and density) are described in
numpy.histogram().
>>> s=hs.signals.Signal1D(np.random.normal(size=(10,100)))>>> # Plot the data histogram>>> s.get_histogram().plot()>>> # Plot the histogram of the signal at the current coordinates>>> s.get_current_signal().get_histogram().plot()
This function computes the real part of the inverse of the discrete
Fourier Transform over the signal axes by means of the Fast Fourier
Transform (FFT) as implemented in numpy.
If None, the shift option will be set to the original status
of the FFT using the value in metadata. If no FFT entry is
present in metadata, the parameter will be set to False.
If True, the origin of the FFT will be shifted to the centre.
If False, the origin will be kept at (0, 0)
(default is None).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
The integration is performed using
Simpson’s rule if
axis.is_binned is False and simple summation over the given axis
if True (along binned axes, the detector already provides
integrated counts per bin).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
:class:`hyperspy.axes.DataAxis` or :class:`hyperspy.axes.FunctionalDataAxis`
Axis which replaces the one specified by the axis argument.
If this new axis exceeds the range of the old axis,
a warning is raised that the data will be extrapolated.
Specifies the axis which will be replaced using the index of the
axis in the axes_manager. The axis can be specified using the index of the
axis in axes_manager or the axis name.
If True the data of self is replaced by the result and
the axis is changed inplace. Otherwise self is not changed
and a new signal with the changes incorporated is returned.
degree: int, default 1
Specifies the B-Spline degree of the used interpolator.
Apply a function to the signal data at all the navigation
coordinates.
The function must operate on numpy arrays. It is applied to the data at
each navigation coordinate pixel-py-pixel. Any extra keyword arguments
are passed to the function. The keywords can take different values at
different coordinates. If the function takes an axis or axes
argument, the function is assumed to be vectorized and the signal axes
are assigned to axis or axes. Otherwise, the signal is iterated
over the navigation axes and a progress bar is displayed to monitor the
progress.
In general, only navigation axes (order, calibration, and number) are
guaranteed to be preserved.
Any function that can be applied to the signal. This function should
not alter any mutable input arguments or input data. So do not do
operations which alter the input, without copying it first.
For example, instead of doing image *= mask, rather do
image = image * mask. Likewise, do not do image[5, 5] = 10
directly on the input data or arguments, but make a copy of it
first. For example via image = copy.deepcopy(image).
If True, the output will be returned as a lazy signal. This means
the calculation itself will be delayed until either compute() is used,
or the signal is stored as a file.
If False, the output will be returned as a non-lazy signal, this
means the outputs will be calculated directly, and loaded into memory.
If None the output will be lazy if the input signal is lazy, and
non-lazy if the input signal is non-lazy.
Indicates if the results for each navigation pixel are of identical
shape (and/or numpy arrays to begin with). If None,
the output signal will be ragged only if the original signal is ragged.
Set the navigation_chunks argument to a tuple of integers to split
the navigation axes into chunks. This can be useful to enable
using multiple cores with signals which are less that 100 MB.
This argument is passed to rechunk().
Since the size and dtype of the signal dimension of the output
signal can be different from the input signal, this output signal
size must be calculated somehow. If both output_signal_size
and output_dtype is None, this is automatically determined.
However, if for some reason this is not working correctly, this
can be specified via output_signal_size and output_dtype.
The most common reason for this failing is due to the signal size
being different for different navigation positions. If this is the
case, use ragged=True. None is default.
All extra keyword arguments are passed to the provided function
Notes
If the function results do not have identical shapes, the result is an
array of navigation shape, where each element corresponds to the result
of the function (of arbitrary object type), called a “ragged array”. As
such, most functions are not able to operate on the result and the data
should be used directly.
This method is similar to Python’s map() that can
also be utilized with a BaseSignal
instance for similar purposes. However, this method has the advantage of
being faster because it iterates the underlying numpy data array
instead of the BaseSignal.
Currently requires a uniform axis.
Examples
Apply a Gaussian filter to all the images in the dataset. The sigma
parameter is constant:
Rotate the two signal dimensions, with different amount as a function
of navigation index. Delay the calculation by getting the output
lazily. The calculation is then done using the compute method.
Rotate the two signal dimensions, with different amount as a function
of navigation index. In addition, the output is returned as a new
signal, instead of replacing the old signal.
If you want some more control over computing a signal that isn’t lazy
you can always set lazy_output to True and then compute the signal setting
the scheduler to ‘threading’, ‘processes’, ‘single-threaded’ or ‘distributed’.
Additionally, you can set the navigation_chunks argument to a tuple of
integers to split the navigation axes into chunks. This can be useful if your
signal is less that 100 mb but you still want to use multiple cores.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Each target component is divided by the output of function(target).
The function must return a scalar when operating on numpy arrays and
must have an axis argument.
Each target component is divided by the output of function(target).
The function must return a scalar when operating on numpy arrays and
must have an axis argument.
For multidimensional datasets an optional figure,
the “navigator”, with a cursor to navigate that data is
raised. In any case it is possible to navigate the data using
the sliders. Currently only signals with signal_dimension equal to
0, 1 and 2 can be plotted.
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If 'auto':
If navigation_dimension > 0, a navigator is
provided to explore the data.
If navigation_dimension is 1 and the signal is an image
the navigator is a sum spectrum obtained by integrating
over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum
the navigator is an image obtained by stacking all the
spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum
image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window
with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes “X”,
“Y”, “Z” and one signal axis, “E”, the default navigator will
be an image obtained by integrating the data over “E” at the
current “Z” index and a window with sliders for the “X”, “Y”,
and “Z” axes will be raised. Notice that changing the “Z”-axis
index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If 'slider':
If navigationdimension > 0 a window with one slider per
axis is raised to navigate the data.
If 'spectrum':
If navigation_dimension > 0 the navigator is always a
spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the 'auto' option will
be used instead.
If None, no navigator will be provided.
Alternatively a BaseSignal (or subclass)
instance can be provided. The navigation or signal shape must
match the navigation shape of the signal to plot or the
navigation_shape + signal_shape must be equal to the
navigator_shape of the current object (for a dynamic navigator).
If the signal dtype is RGB or RGBA this parameter has no effect and
the value is always set to 'slider'.
The function used to normalize the data prior to plotting.
Allowable strings are: 'auto', 'linear', 'log'.
If 'auto', intensity is plotted on a linear scale except when
power_spectrum=True (only for complex signals).
The string must contain any combination of the 'x' and 'v'
characters. If 'x' or 'v' (for values) are in the string, the
corresponding horizontal or vertical axis limits are set to their
maxima and the axis limits will reset when the data or the
navigation indices are changed. Default is 'v'.
Plot factors from blind source separation results. In case of 1D
signal axis, each factors line can be toggled on and off by clicking
on their corresponding line in the legend.
If comp_ids is None, maps of all components will be
returned. If it is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
Plot loadings from blind source separation results. In case of 1D
navigation axis, each loading line can be toggled on and off by
clicking on their corresponding line in the legend.
If comp_ids=None, maps of all components will be
returned. If it is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
One of: 'all', 'ticks', 'off', or None
Controls how the axes are displayed on each image;
default is 'all'
If 'all', both ticks and axis labels will be shown
If 'ticks', no axis labels will be shown, but ticks/labels will
If 'off', all decorations and frame will be disabled
If None, no axis decorations will be shown, but ticks/frame will
Plot the blind source separation factors and loadings.
Unlike plot_bss_factors() and
plot_bss_loadings(),
this method displays one component at a time. Therefore it provides a
more compact visualization than then other two methods.
The loadings and factors are displayed in different windows and each
has its own navigator/sliders to navigate them if they are
multidimensional. The component index axis is synchronized between
the two.
One of: 'smart_auto', 'auto', None, 'spectrum' or a
BaseSignal object.
'smart_auto' (default) displays sliders if the navigation
dimension is less than 3. For a description of the other options
see the plot() documentation
for details.
Currently HyperSpy cannot plot a signal when the signal dimension is
higher than two. Therefore, to visualize the BSS results when the
factors or the loadings have signal dimension greater than 2,
the data can be viewed as spectra (or images) by setting this
parameter to 1 (or 2). (The default is 2)
if None (default), returns maps of all components using the
number_of_cluster was defined when
executing cluster. Otherwise it raises a ValueError.
if int, returns maps of cluster labels with ids from 0 to
given int.
if list of ints, returns maps of cluster labels with ids in
given list.
the number of plots in each row, when the same_window
parameter is True.
axes_decorNone or str {‘all’, ‘ticks’, ‘off’}, optional
Controls how the axes are displayed on each image; default is ‘all’
If ‘all’, both ticks and axis labels will be shown
If ‘ticks’, no axis labels will be shown, but ticks/labels will
If ‘off’, all decorations and frame will be disabled
If None, no axis decorations will be shown, but ticks/frame will
Plot cluster labels from a cluster analysis. In case of 1D navigation axis,
each loading line can be toggled on and off by clicking on the legended
line.
if None (default), returns maps of all components using the
number_of_cluster was defined when
executing cluster. Otherwise it raises a ValueError.
if int, returns maps of cluster labels with ids from 0 to
given int.
if list of ints, returns maps of cluster labels with ids in
given list.
the number of plots in each row, when the same_window
parameter is True.
axes_decorNone or str {'all', 'ticks', 'off'}, default 'all'
Controls how the axes are displayed on each image; default is ‘all’
If ‘all’, both ticks and axis labels will be shown
If ‘ticks’, no axis labels will be shown, but ticks/labels will
If ‘off’, all decorations and frame will be disabled
If None, no axis decorations will be shown, but ticks/frame will
Unlike plot_cluster_labels() and
plot_cluster_signals(), this
method displays one component at a time.
Therefore it provides a more compact visualization than then other
two methods. The labels and centers are displayed in different
windows and each has its own navigator/sliders to navigate them if
they are multidimensional. The component index axis is synchronized
between the two.
Currently HyperSpy cannot plot signals of dimension higher than
two. Therefore, to visualize the clustering results when the
centers or the labels have signal dimension greater than 2
we can view the data as spectra(images) by setting this parameter
to 1(2)
If None, returns maps of all clusters.
If int, returns maps of clusters with ids from 0 to given
int.
If list of ints, returns maps of clusters with ids in
given list.
Plot factors from a decomposition. In case of 1D signal axis, each
factors line can be toggled on and off by clicking on their
corresponding line in the legend.
If comp_ids is None, maps of all components will be
returned if the output_dimension was defined when executing
decomposition(). Otherwise it
raises a ValueError.
If comp_ids is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
If comp_ids is None, maps of all components will be
returned if the output_dimension was defined when executing
decomposition().
Otherwise it raises a ValueError.
If comp_ids is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
One of: 'all', 'ticks', 'off', or None
Controls how the axes are displayed on each image; default is
'all'
If 'all', both ticks and axis labels will be shown.
If 'ticks', no axis labels will be shown, but ticks/labels will.
If 'off', all decorations and frame will be disabled.
If None, no axis decorations will be shown, but ticks/frame
will.
Unlike plot_decomposition_factors()
and plot_decomposition_loadings(),
this method displays one component at a time. Therefore it provides a
more compact visualization than then other two methods. The loadings
and factors are displayed in different windows and each has its own
navigator/sliders to navigate them if they are multidimensional. The
component index axis is synchronized between the two.
One of: 'smart_auto', 'auto', None, 'spectrum' or a
BaseSignal object.
'smart_auto' (default) displays sliders if the navigation
dimension is less than 3. For a description of the other options
see the plot() documentation
for details.
Currently HyperSpy cannot plot a signal when the signal dimension is
higher than two. Therefore, to visualize the BSS results when the
factors or the loadings have signal dimension greater than 2,
the data can be viewed as spectra (or images) by setting this
parameter to 1 (or 2). (The default is 2)
Threshold used to determine how many components should be
highlighted as signal (as opposed to noise).
If a float (between 0 and 1), threshold will be
interpreted as a cutoff value, defining the variance at which to
draw a line showing the cutoff between signal and noise;
the number of signal components will be automatically determined
by the cutoff value.
If an int, threshold is interpreted as the number of
components to highlight as signal (and no cutoff line will be
drawn)
hline: {‘auto’, True, False}
Whether or not to draw a horizontal line illustrating the variance
cutoff for signal/noise determination. Default is to draw the line
at the value given in threshold (if it is a float) and not
draw in the case threshold is an int, or not given.
If True, (and threshold is an int), the line will be drawn
through the last component defined as signal.
If False, the line will not be drawn in any circumstance.
vline: bool, default False
Whether or not to draw a vertical line illustrating an estimate of
the number of significant components. If True, the line will be
drawn at the the knee or elbow position of the curve indicating the
number of significant components.
If False, the line will not be drawn in any circumstance.
xaxis_type{‘index’, ‘number’}
Determines the type of labeling applied to the x-axis.
If 'index', axis will be labeled starting at 0 (i.e.
“pythonic index” labeling); if 'number', it will start at 1
(number labeling).
Determines the format of the x-axis tick labels. If 'ordinal',
“1st, 2nd, …” will be used; if 'cardinal', “1, 2,
…” will be used. If None, an appropriate default will be
selected.
Prints the five-number summary statistics of the data, the mean, and
the standard deviation.
Prints the mean, standard deviation (std), maximum (max), minimum
(min), first quartile (Q1), median, and third quartile. nans are
removed from the calculations.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Rebin the signal into a smaller or larger shape, based on linear
interpolation. Specify eithernew_shape or scale. Scale of 1
means no binning and scale less than one results in up-sampling.
For each dimension, specify the new:old pixel ratio, e.g. a ratio
of 1 is no binning and a ratio of 2 means that each pixel in the new
spectrum is twice the size of the pixels in the old spectrum.
The length of the list should match the dimension of the
Signal’s underlying data array.
Note : Only one of ``scale`` or ``new_shape`` should be specified,
otherwise the function will not run
Whether or not to crop the resulting rebinned data (default is
True). When binning by a non-integer number of
pixels it is likely that the final row in each dimension will
contain fewer than the full quota to fill one pixel. For example,
a 5*5 array binned by 2.1 will produce two rows containing
2.1 pixels and one row containing only 0.8 pixels. Selection of
crop=True or crop=False determines whether or not this
“black” line is cropped from the final binned array or not.
Please note that if ``crop=False`` is used, the final row in each
dimension may appear black if a fractional number of pixels are left
over. It can be removed but has been left to preserve total counts
before and after binning.
Specify the dtype of the output. If None, the dtype will be
determined by the behaviour of numpy.sum(), if "same",
the dtype will be kept the same. Default is None.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
By default dtype=None, the dtype is determined by the behaviour of
numpy.sum, in this case, unsigned integer of the same precision as
the platform integer
The axis can be passed directly, or specified using the index of
the axis in the Signal’s axes_manager or the axis name. The axis to roll backwards.
The positions of the other axes do not change relative to one
another.
The axis can be passed directly, or specified using the index of
the axis in the Signal’s axes_manager or the axis name. The axis is rolled until it lies before this other axis.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
The function gets the format from the specified extension (see
Supported formats in the User Guide for more information):
'hspy' for HyperSpy’s HDF5 specification
'rpl' for Ripple (useful to export to Digital Micrograph)
'msa' for EMSA/MSA single spectrum saving.
'unf' for SEMPER unf binary format.
'blo' for Blockfile diffraction stack saving.
Many image formats such as 'png', 'tiff', 'jpeg'…
If no extension is provided the default file format as defined
in the preferences is used.
Please note that not all the formats supports saving datasets of
arbitrary dimensions, e.g. 'msa' only supports 1D data, and
blockfiles only supports image stacks with a navigation_dimension < 2.
Each format accepts a different set of parameters. For details
see the specific format documentation.
If None (default) and tmp_parameters.filename and
tmp_parameters.folder are defined, the
filename and path will be taken from there. A valid
extension can be provided e.g. 'my_file.rpl'
(see extension parameter).
The extension of the file that defines the file format.
Allowable string values are: {'hspy', 'hdf5', 'rpl',
'msa', 'unf', 'blo', 'emd', and common image
extensions e.g. 'tiff', 'png', etc.}
'hspy' and 'hdf5' are equivalent. Use 'hdf5' if
compatibility with HyperSpy versions older than 1.2 is required.
If None, the extension is determined from the following list in
this order:
HyperSpy, Nexus and EMD NCEM format only. Define chunks used when
saving. The chunk shape should follow the order of the array
(s.data.shape), not the shape of the axes_manager.
If None and lazy signal, the dask array chunking is used.
If None and non-lazy signal, the chunks are estimated automatically
to have at least one chunk per signal space.
If True, the chunking is determined by the the h5py guess_chunk
function.
Nexus file only. Option to save hyperspy.original_metadata with
the signal. A loaded Nexus file may have a large amount of data
when loaded which you may wish to omit on saving
Nexus file only. Define the default dataset in the file.
If set to True the signal or first signal in the list of signals
will be defined as the default (following Nexus v3 data rules).
Only for hspy files. If True, write the dataset, otherwise, don’t
write it. Useful to save attributes without having to write the
whole dataset. Default is True.
Set the signal type and convert the current signal accordingly.
The signal_type attribute specifies the type of data that the signal
contains e.g. electron energy-loss spectroscopy data,
photoemission spectroscopy data, etc.
When setting signal_type to a “known” type, HyperSpy converts the
current signal to the most appropriate
BaseSignal subclass. Known signal types are
signal types that have a specialized
BaseSignal subclass associated, usually
providing specific features for the analysis of that type of signal.
HyperSpy ships with a minimal set of known signal types. External
packages can register extra signal types. To print a list of
registered signal types in the current installation, call
print_known_signal_types(), and see
the developer guide for details on how to add new signal_types.
A non-exhaustive list of HyperSpy extensions is also maintained
here: hyperspy/hyperspy-extensions-list.
If no arguments are passed, the signal_type is set to undefined
and the current signal converted to a generic signal subclass.
Otherwise, set the signal_type to the given signal
type or to the signal type corresponding to the given signal type
alias. Setting the signal_type to a known signal type (if exists)
is highly advisable. If none exists, it is good practice
to set signal_type to a value that best describes the data signal
type.
The split can be defined by giving the number_of_parts, a homogeneous
step size, or a list of customized step sizes. By default ('auto'),
the function is the reverse of stack().
The axis can be passed directly, or specified using the index of
the axis in the Signal’s axes_manager or the axis name.
If 'auto' and if the object has been created with
stack() (and stack_metadata=True),
this method will return the former list of signals (information
stored in metadata._HyperSpy.Stacking_history).
If it was not created with stack(),
the last navigation axis will be used.
Number of parts in which the spectrum image will be split. The
splitting is homogeneous. When the axis size is not divisible
by the number_of_parts the remainder data is lost without
warning. If number_of_parts and step_sizes is 'auto',
number_of_parts equals the length of the axis,
step_sizes equals one, and the axis is suppressed from each
sub-spectrum.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If you intend to calculate the numerical integral of an unbinned signal,
please use the integrate1D() function
instead. To avoid erroneous misuse of the sum function as integral,
it raises a warning when working with an unbinned, non-uniform axis.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
Transfer data array from host to GPU device memory using cupy.asarray.
Lazy signals are not supported by this method, see user guide for
information on how to process data lazily using the GPU.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
With the exception of both axes parameters (signal_axes and
navigation_axes getting iterables, generally one has to be None
(i.e. “floating”). The other one specifies either the required number
or explicitly the indices of axes to move to the corresponding space.
If both are iterables, full control is given as long as all axes
are assigned to one space only.
Examples
>>> # just create a signal with many distinct dimensions>>> s=hs.signals.BaseSignal(np.random.rand(1,2,3,4,5,6,7,8,9))>>> s<BaseSignal, title: , dimensions: (|9, 8, 7, 6, 5, 4, 3, 2, 1)>
>>> s.transpose()# swap signal and navigation spaces<BaseSignal, title: , dimensions: (9, 8, 7, 6, 5, 4, 3, 2, 1|)>
>>> s.T# a shortcut for no arguments<BaseSignal, title: , dimensions: (9, 8, 7, 6, 5, 4, 3, 2, 1|)>
>>> # roll to leave 5 axes in navigation space>>> s.transpose(signal_axes=5)<BaseSignal, title: , dimensions: (4, 3, 2, 1|9, 8, 7, 6, 5)>
>>> # 3 explicitly defined axes in signal space>>> s.transpose(signal_axes=[0,2,6])<BaseSignal, title: , dimensions: (8, 6, 5, 4, 2, 1|9, 7, 3)>
>>> # A mix of two lists, but specifying all axes explicitly>>> # The order of axes is preserved in both lists>>> s.transpose(navigation_axes=[1,2,3,4,5,8],signal_axes=[0,6,7])<BaseSignal, title: , dimensions: (8, 7, 6, 5, 4, 1|9, 3, 2)>
Use this function together with a with statement to have the
signal be unfolded for the scope of the with block, before
automatically refolding when passing out of scope.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
The position of the edges of the diagram with shape (2, 2) or (2,).
All values outside of this range will be considered outliers and not
tallied in the histogram. If None use the mininum and maximum values.
Typecode or data-type to which the array is cast. For complex signals only other
complex dtypes are allowed. If real valued properties are required use real,
imag, amplitude and phase instead.
For multidimensional datasets an optional figure,
the “navigator”, with a cursor to navigate that data is
raised. In any case it is possible to navigate the data using
the sliders. Currently only signals with signal_dimension equal to
0, 1 and 2 can be plotted.
If True, plot the power spectrum instead of the actual signal, if
False, plot the real and imaginary parts of the complex signal.
representation{'cartesian' | 'polar'}
Determines if the real and imaginary part of the complex data is plotted ('cartesian',
default), or if the amplitude and phase should be used ('polar').
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If 'auto':
If navigation_dimension > 0, a navigator is
provided to explore the data.
If navigation_dimension is 1 and the signal is an image
the navigator is a sum spectrum obtained by integrating
over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum
the navigator is an image obtained by stacking all the
spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum
image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window
with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes “X”,
“Y”, “Z” and one signal axis, “E”, the default navigator will
be an image obtained by integrating the data over “E” at the
current “Z” index and a window with sliders for the “X”, “Y”,
and “Z” axes will be raised. Notice that changing the “Z”-axis
index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If 'slider':
If navigationdimension > 0 a window with one slider per
axis is raised to navigate the data.
If 'spectrum':
If navigation_dimension > 0 the navigator is always a
spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the 'auto' option will
be used instead.
If None, no navigator will be provided.
Alternatively a BaseSignal (or subclass)
instance can be provided. The navigation or signal shape must
match the navigation shape of the signal to plot or the
navigation_shape + signal_shape must be equal to the
navigator_shape of the current object (for a dynamic navigator).
If the signal dtype is RGB or RGBA this parameter has no effect and
the value is always set to 'slider'.
When an element of the sequence is True, the unwrapping process
will regard the edges along the corresponding axis of the image to be
connected and use this connectivity to guide the phase unwrapping
process. If only a single boolean is given, it will apply to all axes.
Wrap around is not supported for 1D arrays.
Pass to the rng argument of the unwrap_phase()
function. Unwrapping 2D or 3D images uses random initialization.
This sets the seed of the PRNG to achieve deterministic behavior.
Uses the unwrap_phase() function from skimage.
The algorithm is based on Miguel Arevallilo Herraez, David R. Burton, Michael J. Lalor,
and Munther A. Gdeisat, “Fast two-dimensional phase-unwrapping algorithm based on sorting
by reliability following a noncontinuous path”, Journal Applied Optics,
Vol. 41, No. 35, pp. 7437, 2002
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
For multidimensional datasets an optional figure,
the “navigator”, with a cursor to navigate that data is
raised. In any case it is possible to navigate the data using
the sliders. Currently only signals with signal_dimension equal to
0, 1 and 2 can be plotted.
If True, plot the power spectrum instead of the actual signal, if
False, plot the real and imaginary parts of the complex signal.
representation{'cartesian' | 'polar'}
Determines if the real and imaginary part of the complex data is plotted ('cartesian',
default), or if the amplitude and phase should be used ('polar').
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If 'auto':
If navigation_dimension > 0, a navigator is
provided to explore the data.
If navigation_dimension is 1 and the signal is an image
the navigator is a sum spectrum obtained by integrating
over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum
the navigator is an image obtained by stacking all the
spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum
image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window
with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes “X”,
“Y”, “Z” and one signal axis, “E”, the default navigator will
be an image obtained by integrating the data over “E” at the
current “Z” index and a window with sliders for the “X”, “Y”,
and “Z” axes will be raised. Notice that changing the “Z”-axis
index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If 'slider':
If navigationdimension > 0 a window with one slider per
axis is raised to navigate the data.
If 'spectrum':
If navigation_dimension > 0 the navigator is always a
spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the 'auto' option will
be used instead.
If None, no navigator will be provided.
Alternatively a BaseSignal (or subclass)
instance can be provided. The navigation or signal shape must
match the navigation shape of the signal to plot or the
navigation_shape + signal_shape must be equal to the
navigator_shape of the current object (for a dynamic navigator).
If the signal dtype is RGB or RGBA this parameter has no effect and
the value is always set to 'slider'.
The string must contain any combination of the 'x', 'y' and 'v'
characters. If 'x' or 'y' are in the string, the corresponding
axis limits are set to cover the full range of the data at a given
position. If 'v' (for values) is in the string, the contrast of the
image will be set automatically according to vmin`and``vmax when
the data or navigation indices change. Default is 'v'.
Set the norm of the image to display. If "auto", a linear scale is
used except if when power_spectrum=True in case of complex data
type. "symlog" can be used to display negative value on a negative
scale - read matplotlib.colors.SymLogNorm and the
linthresh and linscale parameter for more details.
vmin and vmax are used to normalise the displayed data. It can
be a float or a string. If string, it should be formatted as 'xth',
where 'x' must be an float in the [0, 100] range. 'x' is used to
compute the x-th percentile of the data. See
numpy.percentile() for more information.
Parameter used in the power-law normalisation when the parameter
norm="power". Read matplotlib.colors.PowerNorm for more
details. Default value is 1.0.
When used with norm="symlog", define the range within which the
plot is linear (to avoid having the plot go to infinity around
zero). Default value is 0.01.
This allows the linear range (-linthresh to linthresh) to be
stretched relative to the logarithmic range. Its value is the
number of powers of base to use for each half of the linear range.
See matplotlib.colors.SymLogNorm for more details.
Defaulf value is 0.1.
If True the centre of the color scheme is set to zero. This is
specially useful when using diverging color schemes. If “auto”
(default), diverging color schemes are automatically centred.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
Estimate the shifts in the signal axis using
cross-correlation and use the estimation to align the data in place.
This method can only estimate the shift by comparing
unidimensional
features that should not change the position.
To decrease memory usage, time of computation and improve
accuracy it is convenient to select the feature of interest
setting the start and end keywords. By default interpolation is
used to obtain subpixel precision.
Specifies the kind of interpolation as a string (‘linear’,
‘nearest’, ‘zero’, ‘slinear’, ‘quadratic, ‘cubic’) or as an
integer specifying the order of the spline interpolator to
use.
A list of BaseSignal instances that has exactly the same
dimensions as this one and that will be aligned using the shift map
estimated using the this signal.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If int the values are taken as indices. If float they are
converted to indices using the spectral axis calibration.
If left_value is None crops from the beginning of the axis.
If right_value is None crops up to the end of the axis. If
both are None the interactive cropping interface is activated
enabling cropping the spectrum using a span selector in the
signal plot.
Estimate the width of the highest intensity of peak
of the spectra at a given fraction of its maximum.
It can be used with asymmetric peaks. For accurate results any
background must be previously subtracted.
The estimation is performed by interpolation using cubic splines.
The size of the window centred at the peak maximum
used to perform the estimation.
The window size must be chosen with care: if it is narrower
than the width of the peak at some positions or if it is
so wide that it includes other more intense peaks this
method cannot compute the width and a NaN is stored instead.
Estimate the shifts in the current signal axis using
cross-correlation.
This method can only estimate the shift by comparing
unidimensional features that should not change the position in
the signal axis. To decrease the memory usage, the time of
computation and the accuracy of the results it is convenient to
select the feature of interest providing sensible values for
start and end. By default interpolation is used to obtain
subpixel precision.
An array with the result of the estimation in the axis units.
Although the computation is performed in batches if the signal is
lazy, the result is computed in memory because it depends on the
current state of the axes that could change later on in the workflow.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
Number of points around the “top part” of the peak
that is taken to estimate the peak height.
For spikes or very narrow peaks, set peakgroup to 1 or 2;
for broad or noisy peaks, make peakgroup larger to
reduce the effect of noise.
The windows around the (start, end) to use for interpolation. If
int, they are taken as index steps. If float, they are taken in
units of the axis value.
For multidimensional datasets an optional figure,
the “navigator”, with a cursor to navigate that data is
raised. In any case it is possible to navigate the data using
the sliders. Currently only signals with signal_dimension equal to
0, 1 and 2 can be plotted.
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If 'auto':
If navigation_dimension > 0, a navigator is
provided to explore the data.
If navigation_dimension is 1 and the signal is an image
the navigator is a sum spectrum obtained by integrating
over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum
the navigator is an image obtained by stacking all the
spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum
image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window
with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes “X”,
“Y”, “Z” and one signal axis, “E”, the default navigator will
be an image obtained by integrating the data over “E” at the
current “Z” index and a window with sliders for the “X”, “Y”,
and “Z” axes will be raised. Notice that changing the “Z”-axis
index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If 'slider':
If navigationdimension > 0 a window with one slider per
axis is raised to navigate the data.
If 'spectrum':
If navigation_dimension > 0 the navigator is always a
spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the 'auto' option will
be used instead.
If None, no navigator will be provided.
Alternatively a BaseSignal (or subclass)
instance can be provided. The navigation or signal shape must
match the navigation shape of the signal to plot or the
navigation_shape + signal_shape must be equal to the
navigator_shape of the current object (for a dynamic navigator).
If the signal dtype is RGB or RGBA this parameter has no effect and
the value is always set to 'slider'.
The function used to normalize the data prior to plotting.
Allowable strings are: 'auto', 'linear', 'log'.
If 'auto', intensity is plotted on a linear scale except when
power_spectrum=True (only for complex signals).
The string must contain any combination of the 'x' and 'v'
characters. If 'x' or 'v' (for values) are in the string, the
corresponding horizontal or vertical axis limits are set to their
maxima and the axis limits will reset when the data or the
navigation indices are changed. Default is 'v'.
Remove the background, either in place using a GUI or returned as a new
spectrum using the command line. The fast option is not accurate for
most background types - except Gaussian, Offset and
Power law - but it is useful to estimate the initial fitting parameters
before performing a full fit.
Parameters:
signal_range“interactive”, tuple of int or float, optional
If this argument is not specified, the signal range has to be
selected using a GUI. And the original spectrum will be replaced.
If tuple is given, a spectrum will be returned.
The type of component which should be used to fit the background.
Possible components: Doniach, Gaussian, Lorentzian, Offset,
Polynomial, PowerLaw, Exponential, SkewNormal, SplitVoigt, Voigt.
If Polynomial is used, the polynomial order can be specified
If True, perform an approximative estimation of the parameters.
If False, the signal is fitted using non-linear least squares
afterwards. This is slower compared to the estimation but
often more accurate.
If True, all spectral channels lower than the lower bound of the
fitting range will be set to zero (this is the default behavior
of Gatan’s DigitalMicrograph). Setting this value to False
allows for inspection of the quality of background fit throughout
the pre-fitting region.
If True, add a (green) line previewing the remainder signal after
background removal. This preview is obtained from a Fast calculation
so the result may be different if a NLLS calculation is finally
performed.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If signal_range is not 'interactive', the signal with background
subtracted is returned. If return_model=True, returns the
background model, otherwise, the GUI widget dictionary is returned
if display=False - see the display parameter documentation.
Specifies the kind of interpolation as a string ('linear',
'nearest', 'zero', 'slinear', 'quadratic',
'cubic') or as an integer specifying the order of the spline
interpolator to use.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
Apply a Savitzky-Golay filter to the data in place.
If polynomial_order or window_length or differential_order are
None the method is run in interactive mode.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
if int set the threshold value use for the detecting the spikes.
If "auto", determine the threshold value as being the first zero
value in the histogram obtained from the
spikes_diagnosis()
method.
If True, remove the spikes using the graphical user interface.
If False, remove all the spikes automatically, which can
introduce artefacts if used with signal containing peak-like
features. However, this can be mitigated by using the
signal_mask argument to mask the signal of interest.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
The fulcrum of the linear ramp is at the origin and the slopes are
given in units of the axis with the according scale taken into
account. Both are available via the axes_manager of the signal.
The array of shifts must be in pixel units. The shape must be
the navigation shape using numpy order convention. If None
the shifts are estimated using
estimate_shift2D().
If interactive is False, these must be set. If given in floats
the input will be in scaled axis values. If given in integers,
the input will be in non-scaled pixel values. Similar to how
integer and float input works when slicing using isig and inav.
Estimate the shifts in an image using phase correlation.
This method can only estimate the shift by comparing
bi-dimensional features that should not change position
between frames. To decrease the memory usage, the time of
computation and the accuracy of the results it is convenient
to select a region of interest by setting the roi argument.
Parameters:
reference{‘current’, ‘cascade’ ,’stat’}
If ‘current’ (default) the image at the current
coordinates is taken as reference. If ‘cascade’ each image
is aligned with the previous one. If ‘stat’ the translation
of every image with all the rest is estimated and by
performing statistical analysis on the result the
translation is estimated.
This parameter is only relevant when reference=’stat’.
If float, the shift estimations with a maximum correlation
value lower than the given value are not used to compute
the estimated shifts. If ‘auto’ the threshold is calculated
automatically as the minimum maximum correlation value
of the automatically selected reference image.
Define the region of interest (left, right, top, bottom).
If int (float), the position is given by axis index (value).
Note that ROIs can be used in place of a tuple.
If True plots the images after applying the filters and
the phase correlation. If ‘reuse’, it will also plot the images,
but it will only use one figure, and continuously update the images
in that figure as it progresses through the stack.
The statistical analysis approach to the translation estimation
when using reference='stat' roughly follows [*].
If you use it please cite their article.
‘minmax’ - finds peaks by comparing maximum filter results
with minimum filter, calculates centers of mass. See the
find_peaks_minmax()
function.
‘zaefferer’ - based on gradient thresholding and refinement
by local region of interest optimisation. See the
find_peaks_zaefferer()
function.
‘stat’ - based on statistical refinement and difference with
respect to mean intensity. See the
find_peaks_stat()
function.
‘laplacian_of_gaussian’ - a blob finder using the laplacian of
Gaussian matrices approach. See the
find_peaks_log()
function.
‘difference_of_gaussian’ - a blob finder using the difference
of Gaussian matrices approach. See the
find_peaks_dog()
function.
‘template_matching’ - A cross correlation peakfinder. This
method requires providing a template with the template
parameter, which is used as reference pattern to perform the
template matching to the signal. It uses the
skimage.feature.match_template() function and the peaks
position are obtained by using minmax method on the
template matching result.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
numpy.ndarray if current_index=True.
Ragged signal with shape (npeaks, 2) that contains the x, y
pixel coordinates of peaks found in each image sorted
first along y and then along x.
Notes
As a convenience, the ‘local_max’ method accepts the ‘distance’ and
‘threshold’ argument, which will be map to the ‘min_distance’ and
‘threshold_abs’ of the skimage.feature.peak_local_max()
function.
For multidimensional datasets an optional figure,
the “navigator”, with a cursor to navigate that data is
raised. In any case it is possible to navigate the data using
the sliders. Currently only signals with signal_dimension equal to
0, 1 and 2 can be plotted.
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If 'auto':
If navigation_dimension > 0, a navigator is
provided to explore the data.
If navigation_dimension is 1 and the signal is an image
the navigator is a sum spectrum obtained by integrating
over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum
the navigator is an image obtained by stacking all the
spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum
image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window
with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes “X”,
“Y”, “Z” and one signal axis, “E”, the default navigator will
be an image obtained by integrating the data over “E” at the
current “Z” index and a window with sliders for the “X”, “Y”,
and “Z” axes will be raised. Notice that changing the “Z”-axis
index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If 'slider':
If navigationdimension > 0 a window with one slider per
axis is raised to navigate the data.
If 'spectrum':
If navigation_dimension > 0 the navigator is always a
spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the 'auto' option will
be used instead.
If None, no navigator will be provided.
Alternatively a BaseSignal (or subclass)
instance can be provided. The navigation or signal shape must
match the navigation shape of the signal to plot or the
navigation_shape + signal_shape must be equal to the
navigator_shape of the current object (for a dynamic navigator).
If the signal dtype is RGB or RGBA this parameter has no effect and
the value is always set to 'slider'.
The string must contain any combination of the 'x', 'y' and 'v'
characters. If 'x' or 'y' are in the string, the corresponding
axis limits are set to cover the full range of the data at a given
position. If 'v' (for values) is in the string, the contrast of the
image will be set automatically according to vmin`and``vmax when
the data or navigation indices change. Default is 'v'.
Set the norm of the image to display. If "auto", a linear scale is
used except if when power_spectrum=True in case of complex data
type. "symlog" can be used to display negative value on a negative
scale - read matplotlib.colors.SymLogNorm and the
linthresh and linscale parameter for more details.
vmin and vmax are used to normalise the displayed data. It can
be a float or a string. If string, it should be formatted as 'xth',
where 'x' must be an float in the [0, 100] range. 'x' is used to
compute the x-th percentile of the data. See
numpy.percentile() for more information.
Parameter used in the power-law normalisation when the parameter
norm="power". Read matplotlib.colors.PowerNorm for more
details. Default value is 1.0.
When used with norm="symlog", define the range within which the
plot is linear (to avoid having the plot go to infinity around
zero). Default value is 0.01.
This allows the linear range (-linthresh to linthresh) to be
stretched relative to the logarithmic range. Its value is the
number of powers of base to use for each half of the linear range.
See matplotlib.colors.SymLogNorm for more details.
Defaulf value is 0.1.
If True the centre of the color scheme is set to zero. This is
specially useful when using diverging color schemes. If “auto”
(default), diverging color schemes are automatically centred.
API of classes, which are not part of the hyperspy.api namespace but are inherited in
HyperSpy classes. The signal classes are not expected to be instantiated by users but their methods,
which are used by other classes, are documented here.
It supports indexing, slicing, subscripting and iteration. As an iterator,
iterate over the navigation coordinates returning the current indices.
It can only be indexed and sliced to access the DataAxis objects that it
contains. Standard indexing and slicing follows the “natural order” as in
Signal, i.e. [nX, nY, …,sX, sY,…] where n indicates a navigation axis
and s a signal axis. In addition, AxesManager supports indexing using
complex numbers a + bj, where b can be one of 0, 1, 2 and 3 and a valid
index. If b is 3, AxesManager is indexed using the order of the axes in the
array. If b is 1(2), indexes only the navigation(signal) axes in the
natural order. In addition AxesManager supports subscription using
axis name.
Convert the scale and the units of the selected axes. If the unit
of measure is not supported by the pint library, the scale and units
are not changed.
Convert to a convenient scale and units on the specified axis.
If int, the axis can be specified using the index of the
axis in axes_manager.
If string, argument can be "navigation" or "signal" to
select the navigation or signal axes. The axis name can also be
provided. If None, convert all axes.
If list, the selected axes will be converted to the provided units.
If string, the navigation or signal axes will be converted to the
provided units.
If None, the scale and the units are converted to the appropriate
scale and units to avoid displaying scalebar with >3 digits or too
small number. This can be tweaked by the factor argument.
If True, force to keep the same units if the units of
the axes differs. It only applies for the same kind of axis,
"navigation" or "signal". By default the converted unit
of the first axis is used for all axes. If False, convert all
axes individually.
‘factor’ is an adjustable value used to determine the prefix of
the units. The product factor * scale * size is passed to the
pint to_compact method to determine the prefix.
Get and set the current coordinates, if the navigation dimension
is not 0. If the navigation dimension is 0, it raises
AttributeError when attempting to set its value.
Given a list of either axes dictionaries, these are
added to the AxesManager. In case dictionaries defining the axes
properties are passed, the
DataAxis,
UniformDataAxis,
FunctionalDataAxis instances are first
created.
The index of the axis in the array and in the _axes lists
can be defined by the index_in_array keyword if given
for all axes. Otherwise, it is defined by their index in the
list.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
Get and set the current indices, if the navigation dimension
is not 0. If the navigation dimension is 0, it raises
AttributeError when attempting to set its value.
Sets the order of iterating through the indices in the navigation
dimension. Can be either “flyback” or “serpentine”, or an iterable
of navigation indices.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
DataAxis class for a non-uniform axis defined through an axis array.
The most flexible type of axis, where the axis points are directly given by
an array named axis. As this can be any array, the property
is_uniform is automatically set to False.
The beginning of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
The end of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
The name of the attribute to update. If the attribute does not
exist in either of the AxesManagers, an AttributeError will be
raised. If None, units will be updated.
DataAxis class for a non-uniform axis defined through an expression.
A FunctionalDataAxis is defined based on an expression that is
evaluated to yield the axis points. The expression is a function defined
as a string using the SymPy
text expression format. An example would be expression=a/x+b.
Any variables in the expression, in this case a and b must be
defined as additional attributes of the axis. The property is_uniform
is automatically set to False.
x itself is an instance of BaseDataAxis. By default, it will be a
UniformDataAxis with offset=0 and scale=1 of the given
size. However, it can also be initialized with custom offset and
scale values. Alternatively, it can be a non-uniform DataAxis.
The beginning of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
The end of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
A list of the name of the attribute to update. If an attribute does not
exist in either of the AxesManagers, an AttributeError will be
raised. If None, the parameters of expression are updated.
Returns
——-
A boolean indicating whether any changes were made.
DataAxis class for a uniform axis defined through a scale, an
offset and a size.
The most common type of axis. It is defined by the offset, scale
and size parameters, which determine the initial value, spacing and
length of the axis, respectively. The actual axis array is
automatically calculated from these three values. The UniformDataAxis
is a special case of the FunctionalDataAxis defined by the function
scale*x+offset.
The beginning of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
The end of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
The name of the attribute to update. If the attribute does not
exist in either of the AxesManagers, an AttributeError will be
raised. If None, scale, offset and units are updated.
Returns
——-
A boolean indicating whether any changes were made.
Handling of values intermediate between two axis points:
If rounding=round, use python’s standard round-half-to-even strategy to find closest value.
If rounding=math.floor, round to the next lower value.
If rounding=math.ceil, round to the next higher value.
If value is out of bounds or contains out of bounds values (array).
If value is NaN or contains NaN values (array).
If value is incorrectly formatted str or contains incorrectly
formatted str (array).
Convert the scale and the units of the current axis. If the unit
of measure is not supported by the pint library, the scale and units
are not modified.
Default = None
If str, the axis will be converted to the provided units.
If “auto”, automatically determine the optimal units to avoid
using too large or too small numbers. This can be tweaked by the
factor argument.
‘factor’ is an adjustable value used to determine the prefix of
the units. The product factor * scale * size is passed to the
pint to_compact method to determine the prefix.
Pass to define the arguments of the trigger() function. Each
element must either be an argument name, or a tuple containing
the argument name and the argument’s default value.
Examples
>>> fromhyperspy.eventsimportEvent>>> Event()<hyperspy.events.Event: set()>>>> Event(doc="This event has a docstring!").__doc__'This event has a docstring!'>>> e1=Event()>>> e2=Event(arguments=('arg1',('arg2',None)))>>> e1.trigger(arg1=12,arg2=43,arg3='str',arg4=4.3)# Can trigger with whatever>>> e2.trigger(arg1=11,arg2=22,arg3=3.4)Traceback (most recent call last):...TypeError: trigger() got an unexpected keyword argument 'arg3'
If "all", all the trigger keyword arguments are passed to the
function. If a list or tuple of strings, only those keyword
arguments that are in the tuple or list are passed. If empty,
no keyword argument is passed. If dictionary, the keyword arguments
of trigger are mapped as indicated in the dictionary. For example,
{“a” : “b”} maps the trigger argument “a” to the function argument
“b”.
Disconnects a function from the event. The passed function will be
disconnected irregardless of which ‘nargs’ argument was passed to
connect().
If you only need to temporarily prevent a function from being called,
single callback suppression is supported by the suppress_callback
context manager.
Parameters:
function: function
return_connection_kwargs: bool, default False
If True, returns the kwargs that would reconnect the function as
it was.
Use this function with a ‘with’ statement to temporarily suppress
all events in the container. When the ‘with’ lock completes, the old
suppression values will be restored.
Use this function with a ‘with’ statement to temporarily suppress
a single callback from being called. All other connected callbacks
will trigger. When the ‘with’ lock completes, the old suppression value
will be restored.
>>> withobj.events.myevent.suppress_callback(f):... # Events will trigger as normal, but `f` will not be called... obj.val_a=a... obj.val_b=b>>> # Here, `f` will be called as before:>>> obj.events.myevent.trigger()
Triggers the event. If the event is suppressed, this does nothing.
Otherwise it calls all the connected functions with the arguments as
specified when connected.
Use this function with a ‘with’ statement to temporarily suppress
all events added. When the ‘with’ lock completes, the old suppression
values will be restored.
Use this function with a ‘with’ statement to temporarily suppress
all callbacks of all events in the container. When the ‘with’ lock
completes, the old suppression values will be restored.
>>> withobj.events.suppress():... # Any events triggered by assignments are prevented:... obj.val_a=a... obj.val_b=b>>> # Trigger one event instead:>>> obj.events.values_changed.trigger()
Performs maximum likelihood PCA with missing data and/or heteroskedastic noise.
Standard PCA based on a singular value decomposition (SVD) approach assumes
that the data is corrupted with Gaussian, or homoskedastic noise. For many
applications, this assumption does not hold. For example, count data from
EDS-TEM experiments is corrupted by Poisson noise, where the noise variance
depends on the underlying pixel value. Rather than scaling or transforming
the data to approximately “normalize” the noise, MLPCA instead uses estimates
of the data variance to perform the decomposition.
This function is a transcription of a MATLAB code obtained from [Andrews1997].
The solver is selected by a default policy based on data.shape and
output_dimension: if the input data is larger than 500x500 and the
number of components to extract is lower than 80% of the smallest
dimension of the data, then the more efficient “randomized”
method is enabled. Otherwise the exact full SVD is computed and
optionally truncated afterwards.
If full:
run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd(), and select the components by postprocessing
If arpack:
use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds(). It requires strictly
0 < output_dimension < min(data.shape)
Darren T. Andrews and Peter D. Wentzell, “Applications
of maximum likelihood principal component analysis: incomplete
data sets and calibration transfer”, Analytica Chimica Acta 350,
no. 3 (September 19, 1997): 341-352.
Performs Online Robust NMF with missing or corrupted data.
The ORNMF code is based on a transcription of the online proximal gradient
descent (PGD) algorithm MATLAB code obtained from the authors of [Zhao2016].
It has been updated to also include L2-normalization cost function that
is able to deal with sparse corruptions and/or outliers slightly faster
(please see ORPCA implementation for details). A further modification
has been made to allow for a changing subspace W, where X ~= WH^T + E
in the ORNMF framework.
Zhao, Renbo, and Vincent YF Tan. “Online nonnegative matrix
factorization with outliers.” Acoustics, Speech and Signal Processing
(ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
Creates Online Robust NMF instance that can learn a representation.
Calculate orthogonal rotations for a matrix of factors or loadings from PCA.
When gamma=1.0, this is known as varimax rotation, which finds a
rotation matrix W that maximizes the variance of the squared
components of A @ W. The rotation matrix preserves orthogonality of
the components.
Performs Online Robust PCA with missing or corrupted data.
The ORPCA code is based on a transcription of MATLAB code from [Feng2013].
It has been updated to include a new initialization method based
on a QR decomposition of the first n “training” samples of the data.
A stochastic gradient descent (SGD) solver is also implemented,
along with a MomentumSGD solver for improved convergence and robustness
with respect to local minima. More information about the gradient descent
methods and choosing appropriate parameters can be found in [Ruder2016].
Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA
via Stochastic Optimization”, Advances in Neural Information Processing
Systems 26, (2013), pp. 404-412.
If project is True, returns the low-rank factors and loadings only
Otherwise, returns the low-rank and sparse error matrices, as well
as the results of a singular value decomposition (SVD) applied to
the low-rank matrix.
If True, use the columns of u as the basis for sign flipping.
Otherwise, use the rows of v. The choice of which variable to base the
decision on is generally algorithm dependent.
The solver is selected by a default policy based on data.shape and
output_dimension: if the input data is larger than 500x500 and the
number of components to extract is lower than 80% of the smallest
dimension of the data, then the more efficient “randomized”
method is enabled. Otherwise the exact full SVD is computed and
optionally truncated afterwards.
If full:
run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd(), and select the components by postprocessing
If arpack:
use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds(). It requires strictly
0 < output_dimension < min(data.shape)
If True, adjusts the signs of the loadings and factors such that
the loadings that are largest in absolute value are always positive.
See svd_flip_signs() for more details.
If "auto":
The solver is selected by a default policy based on data.shape and
output_dimension: if the input data is larger than 500x500 and the
number of components to extract is lower than 80% of the smallest
dimension of the data, then the more efficient “randomized”
method is enabled. Otherwise the exact full SVD is computed and
optionally truncated afterwards.
If "full":
Run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd(), and select the components by postprocessing
If "arpack":
Use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds(). It requires strictly
0 < output_dimension < min(data.shape)
If True, adjusts the signs of the loadings and factors such that
the loadings that are largest in absolute value are always positive.
See svd_flip_signs() for more details.
If True, and svd_flip is True, use the columns of u as the basis for sign-flipping.
Otherwise, use the rows of v. The choice of which variable to base the
decision on is generally algorithm dependent.
A whitening transformation is used to decorrelate
the variables, such that the new covariance matrix
of the whitened data is the identity matrix.
If X is a random vector with non-singular covariance
matrix C, and W is a whitening matrix satisfying
W^T W = C^-1, then the transformation Y = W X will
yield a whitened random vector Y with unit diagonal
covariance. In ZCA whitening, the matrix W = C^-1/2,
while in PCA whitening, the matrix W is the
eigensystem of C. More details can be found in [Kessy2015].
Model and data fitting tools applicable to signals of both one and two
dimensions.
Models of one-dimensional signals should use the
Model1D and models of two-dimensional signals
should use the Model2D.
A model is constructed as a linear combination of
components1D or components2D
that are added to the model using the append() or extend().
If needed, new components can be created easily created using using
Expression code of existing components
as a template.
Once defined, the model can be fitted to the data using fit() or
multifit(). Once the optimizer reaches
the convergence criteria or the maximum number of iterations the new value
of the component parameters are stored in the components.
It is possible to access the components in the model by their name or by
the index in the model. An example is given at the end of this docstring.
If a list of components is given, only the components given in the
list is used in making the returned spectrum. The components can
be specified by name, index or themselves.
The signal where to put the result into. Convenient for parallel
processing. If None (default), creates a new one. If passed, it is
assumed to be of correct shape and dtype and not checked.
If a list of components is given, the operation will be performed
only in the value of the parameters of the given components.
The components can be specified by name, index or themselves.
If None (default), the active components will be considered.
the number of workers to initialise.
If zero, all computations will be done serially.
If None (default), will attempt to use (number-of-cores - 1),
however if just one core is available, will use one worker.
If True, only the value of the active parameters will be exported.
Notes
The name of the files will be determined by each the Component and
each Parameter name attributes. Therefore, it is possible to customise
the file names modify the name attributes.
Fetch the value of the parameters that have been previously stored
in parameter.map[‘values’] if parameter.map[‘is_set’] is True for
those indices.
If it is not previously stored, the current values from parameter.value
are used, which are typically from the fit in the previous pixel of a
multidimensional signal.
Fetch the parameter values from the given array, optionally also
fetching the standard deviations.
Places the parameter values into both m.p0 (the initial values
for the optimizer routine) and component.parameter.value and
…std, for parameters in active components ordered by their
position in the model and component.
The optimization algorithm used to perform the fitting.
"lm" performs least-squares optimization using the
Levenberg-Marquardt algorithm, and supports bounds
on parameters.
"trf" performs least-squares optimization using the
Trust Region Reflective algorithm, and supports
bounds on parameters.
"dogbox" performs least-squares optimization using the
dogleg algorithm with rectangular trust regions, and
supports bounds on parameters.
"odr" performs the optimization using the orthogonal
distance regression (ODR) algorithm. It does not support
bounds on parameters. See scipy.odr for more details.
"DifferentialEvolution" is a global optimization method.
It does support bounds on parameters. See
scipy.optimize.differential_evolution() for more
details on available options.
"DualAnnealing" is a global optimization method.
It does support bounds on parameters. See
scipy.optimize.dual_annealing() for more
details on available options. Requires scipy>=1.2.0.
"SHGO" (simplicial homology global optimization) is a global
optimization method. It does support bounds on parameters. See
scipy.optimize.shgo() for more details on available
options. Requires scipy>=1.2.0.
The loss function to use for minimization. Only "ls" is available
if optimizer is one of "lm", "trf", "dogbox" or "odr".
"ls" minimizes the least-squares loss function.
"ML-poisson" minimizes the negative log-likelihood for
Poisson-distributed data. Also known as Poisson maximum
likelihood estimation (MLE).
"huber" minimize the Huber loss function. The delta value
of the Huber function is controlled by the huber_delta
keyword argument (the default value is 1.0).
callable supports passing your own minimization function.
Whether to use information about the gradient of the loss function
as part of the optimization. This parameter has no effect if
optimizer is a derivative-free or global optimization method.
"fd" uses a finite difference scheme (if available) for numerical
estimation of the gradient. The scheme can be further controlled
with the fd_scheme keyword argument.
"analytical" uses the analytical gradient (if available) to speed
up the optimization, since the gradient does not need to be estimated.
callable should be a function that returns the gradient vector.
None means that no gradient information is used or estimated. Not
available if optimizer is one of "lm", "trf" or "dogbox".
If True, the plot is updated during the optimization
process. It slows down the optimization, but it enables
visualization of the optimization progress.
If grad='fd', selects the finite difference scheme to use.
See scipy.optimize.minimize() for details. Ignored if
optimizer is one of "lm", "trf" or "dogbox".
Any extra keyword argument will be passed to the chosen
optimizer. For more information, read the docstring of the
optimizer of your choice in scipy.optimize.
The chi-squared and reduced chi-squared statistics, and the
degrees of freedom, are computed automatically when fitting,
only when loss_function="ls". They are stored as signals:
chisq, red_chisq and dof.
If the attribute metada.Signal.Noise_properties.variance
is defined as a Signal instance with the same
navigation_dimension as the signal, and loss_function
is "ls" or "huber", then a weighted fit is performed,
using the inverse of the noise variance as the weights.
Note that for both homoscedastic and heteroscedastic noise, if
metadata.Signal.Noise_properties.variance does not contain
an accurate estimation of the variance of the data, then the
chi-squared and reduced chi-squared statistics will not be be
computed correctly. See the Setting the noise properties in the User Guide for more details.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
In combination with save_parameters2file(),
this method can be used to recreate a model stored in a file. Actually,
before HyperSpy 0.8 this is the only way to do so. However, this is known
to be brittle. For example see hyperspy/hyperspy#341.
To mask (i.e. do not fit) at certain position, pass a boolean
numpy.array, where True indicates that the data will NOT be
fitted at the given position.
If True, update the plot for every position as they are processed.
Note that this slows down the fitting by a lot, but it allows for
interactive monitoring of the fitting (if in interactive mode).
At each new row the index begins at the first column,
in accordance with the way numpy.ndindex generates indices.
If "serpentine":
Iterate through the signal in a serpentine, “snake-game”-like
manner instead of beginning each new row at the first index.
Works for n-dimensional navigation space, not just 2D.
If True, only the value of the active parameters will be plotted.
Notes
The name of the files will be determined by each the Component and
each Parameter name attributes. Therefore, it is possible to customise
the file names modify the name attributes.
This method can be used to save the current state of the model in a way
that can be loaded back to recreate it using
load_parameters_from_file().
Actually, as of HyperSpy 0.8 this is the only way to do so.
However, this is known to be brittle. For example see
hyperspy/hyperspy#341.
If None, will apply the function to all components in the model.
If list of components, will apply the functions to the components
in the list. The components can be specified by name, index or
themselves.
If None, will set all the parameters to not free.
If list of strings, will set all the parameters with the same name
as the strings in parameter_name_list to not free.
If None, will apply the function to all components in the model.
If list of components, will apply the functions to the components
in the list. The components can be specified by name, index or
themselves.
If None, will set all the parameters to not free.
If list of strings, will set all the parameters with the same name
as the strings in parameter_name_list to not free.
A list of components whose parameters will changed. The components
can be specified by name, index or themselves. If None, use all
components of the model.
A boolean array defining the signal range. Must be the same
shape as the reversed signal_shape, i.e. signal_shape[::-1].
Where array values are True, signal will be fitted, otherwise not.
Model and data fitting for one dimensional signals.
A model is constructed as a linear combination of
components1D that are added to the model using
append() or extend().
There are many predifined components available in the
components1D module. If needed, new
components can be created easily using the
Expression component or by
using the code of existing components as a template.
Once defined, the model can be fitted to the data using
fit() or
multifit(). Once the optimizer reaches
the convergence criteria or the maximum number of iterations the new value
of the component parameters are stored in the components.
It is possible to access the components in the model by their name or by
the index in the model. An example is given at the end of this docstring.
In the following example we create a histogram from a normal distribution
and fit it with a gaussian component. It demonstrates how to create
a model from a Signal1D instance, add
components to it, adjust the value of the parameters of the components,
fit the model to the data and access the components in the model.
If None, the position of all the active components of the
model that has a well defined x position with a value
in the axis range will get a position adjustment line.
Otherwise the feature is added only to the given components.
The components can be specified by name, index or themselves.
If True the position parameter of the components will be
temporarily fixed until adjust position is disable.
This can
be useful to iteratively adjust the component positions and
fit the model.
If 'interactive' the signal range is selected using the span
selector on the spectrum plot. The signal range can also
be manually specified by passing a tuple of floats (left, right).
If None the current signal range is used. Note that ROIs can be used
in place of a tuple.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
All extra keyword arguments are passed to the
py:meth:~hyperspy.model.BaseModel.fit or
py:meth:~hyperspy.model.BaseModel.multifit
method, depending if only_current is True or False.
Model and data fitting for two dimensional signals.
A model is constructed as a linear combination of
components2D that are added to the model using
append() or
extend(). There are predifined components
available in the components2D module
and custom components can made using the Expression.
If needed, new components can be created easily using the code of existing
components as a template.
Once defined, the model can be fitted to the data using
fit() or multifit().
Once the optimizer reaches the convergence criteria or the maximum number
of iterations the new value of the component parameters are stored in the
components.
It is possible to access the components in the model by their name or by
the index in the model. An example is given at the end of this docstring.
If True, only the value of the parameters that are free will be
exported.
Notes
The name of the files will be determined by each the Component
and each Parameter name attributes. Therefore, it is possible to
customise the file names modify the name attributes.
Fetch the parameter values from an array p and optionally standard
deviation from p_std. Places them component.parameter.value and
…std, according to their position in the component.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If None, will set all the parameters to free.
If list of strings, will set all the parameters with the same name
as the strings in parameter_name_list to free.
If None, will set all the parameters to not free.
If list of strings, will set all the parameters with the same name
as the strings in parameter_name_list to not free.
Returns parameter as a dictionary, saving all attributes from
self._whitelist.keys() For more information see
py:meth:~hyperspy.misc.export_dictionary.export_to_dictionary
The name of the file. If None the Components name followed
by the Parameter name attributes will be used by default.
If a file with the same name exists the name will be
modified by appending a number to the file path.
Similar to ext_force_positive, but in this case the bounds
are defined by bmin and bmax. It is a better idea to use
an optimizer that supports bounding though.
If True, the parameter value is set to be the absolute value
of the input value i.e. if we set Parameter.value = -3, the
value stored is 3 instead. This is useful to bound a value
to be positive in an optimization without actually using an
optimizer that supports bounding.
Fetch the stored value and std attributes from the
parameter.map[‘values’] and …[‘std’] if
parameter.map[‘is_set’] is True for that index. Updates
parameter.value and parameter.std.
If not stored, then .value and .std will remain from their
previous values, i.e. from a fit in a previous pixel.
If None (default), all available widgets are displayed or returned.
If string, only the widgets of the selected toolkit are displayed
if available. If an interable of toolkit strings, the widgets of
all listed toolkits are displayed or returned.
If it is not None, the value of the current parameter is
a function of the given Parameter. The function is by default
the identity function, but it can be defined by
twin_function_expr
Expression of the function that enables setting a functional
relationship between the parameter and its twin. If twin is not
None, the parameter value is calculated as the output of calling the
twin function with the value of the twin parameter. The string is
parsed using sympy, so permitted values are any valid sympy expressions
of one variable. If the function is invertible the twin inverse function
is set automatically.
Expression of the function that enables setting the
value of the twin parameter. If twin is not
None, its value is set to the output of calling the
twin inverse function with the value provided. The string is
parsed using sympy, so permitted values are any valid sympy expressions
of one variable.
SAMFire is a more robust way of fitting multidimensional datasets. By
extracting starting values for each pixel from already fitted pixels,
SAMFire stops the fitting algorithm from getting lost in the parameter
space by always starting close to the optimal solution.
SAMFire only picks starting parameters and the order the pixels (in the
navigation space) are fitted, and does not provide any new minimisation
algorithms.
A list of strategies that will be used to select pixel fitting order
and calculate required starting parameters. Strategies come in two
“flavours” - local and global. Local strategies spread the starting
values to the nearest pixels and forces certain pixel fitting order.
Global strategies look for clusters in parameter values, and suggests
most frequent values. Global strategy do not depend on pixel fitting
order, hence it is randomised.
Changes current strategy to a new one. Certain rules apply:
diffusion -> diffusion : resets all “ignored” pixels
diffusion -> segmenter : saves already calculated pixels to be ignored
when(if) subsequently diffusion strategy is run
If possible, plot current strategy plot. Local strategies plot
grayscale navigation signal with brightness representing order of the
pixel selection. Global strategies plot a collection of histograms,
one per parameter.
If True, only tries to set up the ipyparallel pool. If False - only
the multiprocessing. If None, first tries ipyparallel, and it does
not succeed, then multiprocessing.
API of signal classes, which are not part of the user-facing hyperspy.api namespace but are
inherited in HyperSpy signals classes or used as attributes of signals. These classes are not
expected to be instantiated by users but their methods, which are used by other classes,
are documented here.
Returns the one dimensional signal as a two dimensional signal.
By default ensures the data is stored optimally, hence often making a
copy of the data. See transpose for a more general method with more
options.
optimizebool
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
Typecode string or data-type to which the Signal’s data array is
cast. In addition to all the standard numpy Data type objects (dtype),
HyperSpy supports four extra dtypes for RGB images: 'rgb8',
'rgba8', 'rgb16', and 'rgba16'. Changing from and to
any rgb(a) dtype is more constrained than most other dtype
conversions. To change to an rgb(a) dtype,
the signal_dimension must be 1, and its size should be 3 (for
rgb) or 4 (for rgba) dtypes. The original dtype
should be uint8 or uint16 if converting to rgb(a)8
or rgb(a))16, and the navigation_dimension should be at
least 2. After conversion, the signal_dimension becomes 2. The
dtype of images with original dtype rgb(a)8 or rgb(a)16
can only be changed to uint8 or uint16, and the
signal_dimension becomes 1.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If True, attempt to close the file associated with the dask
array data if any. Note that closing the file will make all other
associated lazy signals inoperative.
Using 2 workers, which can reduce the memory usage (depending on
the data and your computer hardware). Note that num_workers only
work for the ‘threads’ and ‘processes’ scheduler.
>>> s2=s.deepcopy()>>> s2.compute(num_workers=2)
Using a single threaded scheduler, which is useful for debugging
Compute the navigator by taking the sum over a single chunk contained
the specified coordinate. Taking the sum over a single chunk is a
computationally efficient approach to compute the navigator. The data
can be rechunk by specifying the chunks_number argument.
Define the number of chunks in the signal space used for rechunk
the when calculating of the navigator. Useful to define the range
over which the sum is calculated.
If None, the existing chunking will be considered when picking the
chunk used in the navigator calculation.
If True, display a progress bar. If None, the default from
the preferences settings is used.
Returns:
None.
Notes
The number of chunks will affect where the sum is taken. If the sum
needs to be taken in the centre of the signal space (for example, in
the case of diffraction pattern), the number of chunk needs to be an
odd number, so that the middle is centered.
The dask scheduler to use for computations. If None,
dask.threaded.get`willbeusedifpossible,otherwise``dask.get will be used, for example in pyodide interpreter.
the number of dask chunks to pass to the decomposition model.
More chunks require more memory, but should run faster. Will be
increased to contain at least output_dimension signals.
If True, print information about the decomposition being performed.
In the case of sklearn.decomposition objects, this includes the
values of all arguments of the chosen sklearn algorithm.
M. Keenan and P. Kotula, “Accounting for Poisson noise
in the multivariate analysis of ToF-SIMS spectrum images”, Surf.
Interface Anal 36(3) (2004): 203-212.
Returns a signal with the n-th order discrete difference along
given axis. i.e. it calculates the difference between consecutive
values in the given axis: out[n]=a[n+1]-a[n]. See
numpy.diff() for more details.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Note that the size of the data on the given axis decreases by
the given order. i.e. if axis is "x" and order is
2, the x dimension is N, der’s x dimension is N - 2.
If you intend to calculate the numerical derivative, please use the
proper derivative() function
instead. To avoid erroneous misuse of the diff function as derivative,
it raises an error when when working with a non-uniform axis.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
Examples
>>> importdask.arrayasda>>> data=da.random.random((10,200,300))>>> data.chunksize(10, 200, 300)>>> s=hs.signals.Signal1D(data).as_lazy()>>> s.get_chunk_size()# All navigation axes((10,), (200,))>>> s.get_chunk_size(0)# The first navigation axis((200,),)
More sophisticated algorithms for determining the bins can be used
by passing a string as the bins argument. Other than the 'blocks'
and 'knuth' methods, the available algorithms are the same as
numpy.histogram().
Note: The lazy version of the algorithm only supports "scott"
and "fd" as a string argument for bins.
If bins is an int, it defines the number of equal-width
bins in the given range. If bins is a
sequence, it defines the bin edges, including the rightmost
edge, allowing for non-uniform bin widths.
If bins is a string from the list below, will use
the method chosen to calculate the optimal bin width and
consequently the number of bins (see Notes for more detail on
the estimators) from the data that falls within the requested
range. While the bin width will be optimal for the actual data
in the range, the number of bins will be computed to fill the
entire range, including the empty portions. For visualisation,
using the 'auto' option is suggested. Weighted data is not
supported for automated bin size selection.
‘auto’
Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good
all around performance.
‘fd’ (Freedman Diaconis Estimator)
Robust (resilient to outliers) estimator that takes into
account data variability and data size.
‘doane’
An improved version of Sturges’ estimator that works better
with non-normal datasets.
‘scott’
Less robust estimator that that takes into account data
variability and data size.
‘stone’
Estimator based on leave-one-out cross-validation estimate of
the integrated squared error. Can be regarded as a generalization
of Scott’s rule.
‘rice’
Estimator does not take variability into account, only data
size. Commonly overestimates number of bins required.
‘sturges’
R’s default method, only accounts for data size. Only
optimal for gaussian data and underestimates number of bins
for large non-gaussian datasets.
‘sqrt’
Square root (of data size) estimator, used by Excel and
other programs for its speed and simplicity.
‘knuth’
Knuth’s rule is a fixed-width, Bayesian approach to determining
the optimal bin width of a histogram.
‘blocks’
Determination of optimal adaptive-width histogram bins using
the Bayesian Blocks algorithm.
When estimating the bins using one of the str methods, the
number of bins is capped by this number to avoid a MemoryError
being raised by numpy.histogram().
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
**kwargs
other keyword arguments (weight and density) are described in
numpy.histogram().
>>> s=hs.signals.Signal1D(np.random.normal(size=(10,100)))>>> # Plot the data histogram>>> s.get_histogram().plot()>>> # Plot the histogram of the signal at the current coordinates>>> s.get_current_signal().get_histogram().plot()
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
For multidimensional datasets an optional figure,
the “navigator”, with a cursor to navigate that data is
raised. In any case it is possible to navigate the data using
the sliders. Currently only signals with signal_dimension equal to
0, 1 and 2 can be plotted.
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If 'auto':
If navigation_dimension > 0, a navigator is
provided to explore the data.
If navigation_dimension is 1 and the signal is an image
the navigator is a sum spectrum obtained by integrating
over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum
the navigator is an image obtained by stacking all the
spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum
image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window
with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes “X”,
“Y”, “Z” and one signal axis, “E”, the default navigator will
be an image obtained by integrating the data over “E” at the
current “Z” index and a window with sliders for the “X”, “Y”,
and “Z” axes will be raised. Notice that changing the “Z”-axis
index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If 'slider':
If navigationdimension > 0 a window with one slider per
axis is raised to navigate the data.
If 'spectrum':
If navigation_dimension > 0 the navigator is always a
spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the 'auto' option will
be used instead.
If None, no navigator will be provided.
Alternatively a BaseSignal (or subclass)
instance can be provided. The navigation or signal shape must
match the navigation shape of the signal to plot or the
navigation_shape + signal_shape must be equal to the
navigator_shape of the current object (for a dynamic navigator).
If the signal dtype is RGB or RGBA this parameter has no effect and
the value is always set to 'slider'.
The function used to normalize the data prior to plotting.
Allowable strings are: 'auto', 'linear', 'log'.
If 'auto', intensity is plotted on a linear scale except when
power_spectrum=True (only for complex signals).
The string must contain any combination of the 'x' and 'v'
characters. If 'x' or 'v' (for values) are in the string, the
corresponding horizontal or vertical axis limits are set to their
maxima and the axis limits will reset when the data or the
navigation indices are changed. Default is 'v'.
Rebin the signal into a smaller or larger shape, based on linear
interpolation. Specify eithernew_shape or scale. Scale of 1
means no binning and scale less than one results in up-sampling.
For each dimension, specify the new:old pixel ratio, e.g. a ratio
of 1 is no binning and a ratio of 2 means that each pixel in the new
spectrum is twice the size of the pixels in the old spectrum.
The length of the list should match the dimension of the
Signal’s underlying data array.
Note : Only one of ``scale`` or ``new_shape`` should be specified,
otherwise the function will not run
Whether or not to crop the resulting rebinned data (default is
True). When binning by a non-integer number of
pixels it is likely that the final row in each dimension will
contain fewer than the full quota to fill one pixel. For example,
a 5*5 array binned by 2.1 will produce two rows containing
2.1 pixels and one row containing only 0.8 pixels. Selection of
crop=True or crop=False determines whether or not this
“black” line is cropped from the final binned array or not.
Please note that if ``crop=False`` is used, the final row in each
dimension may appear black if a fractional number of pixels are left
over. It can be removed but has been left to preserve total counts
before and after binning.
Specify the dtype of the output. If None, the dtype will be
determined by the behaviour of numpy.sum(), if "same",
the dtype will be kept the same. Default is None.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
By default dtype=None, the dtype is determined by the behaviour of
numpy.sum, in this case, unsigned integer of the same precision as
the platform integer
Rechunks the data using the same rechunking formula from Dask
expect that the navigation and signal chunks are defined seperately.
Note, for most functions sig_chunks should remain None so that it
spans the entire signal axes.
The navigation block dimensions to create.
-1 indicates the full size of the corresponding dimension.
Default is “auto” which automatically determines chunk sizes.
The signal block dimensions to create.
-1 indicates the full size of the corresponding dimension.
Default is -1 which automatically spans the full signal dimension
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
Base class for interactive ROIs, i.e. ROIs with widget interaction.
The base class defines a lot of the common code for interacting with
widgets, but inheritors need to implement the following functions:
Add a widget to visually represent the ROI, and connect it so any
changes in either are reflected in the other. Note that only one
widget can be added per signal/axes combination.
None, it will check whether the widget can be added to the
navigator, i.e. if dimensionality matches, and use it if
possible, otherwise it will try the signal space. If none of the
two attempts work, an error message will be raised.
If specified, this is the widget that will be added. If None, the
default widget will be used.
colormatplotlib color, default 'green'
The color for the widget. Any format that matplotlib uses should be
ok. This will not change the color for any widget passed with the
'widget' argument.
The signal the ROI will be added to, for navigation purposes
only. Only the source signal will be sliced.
If not None, it will automatically create a widget on
navigation_signal. Passing "same" is identical to passing the
same signal to "signal" and "navigation_signal", but is less
ambigous, and allows “same” to be the default value.
If not None, it will use ‘out’ as the output instead of
returning a new Signal.
colormatplotlib color, default: 'green'
The color for the widget. Any format that matplotlib uses should be
ok. This will not change the color for any widget passed with the
‘widget’ argument.
Function responsible for updating anything that depends on the ROI.
It should be called by implementors whenever the ROI changes.
This implementation updates the widgets associated with it, and
triggers the changed event.
Mapping of signal:(widget, axes) to keep track to the signals
(and corresponding widget/signal axes) on which the ROI has been added.
This dictionary is populated in BaseInteractiveROI.add_widget()
Function responsible for updating anything that depends on the ROI.
It should be called by implementors whenever the ROI changes.
The base implementation simply triggers the changed event.
A class to comfortably browse a dictionary using a CLI.
In addition to accessing the values using dictionary syntax
the class enables navigating a dictionary that constains
nested dictionaries as attribures of nested classes.
Also it is an iterator over the (key, value) items. The
__repr__ method provides pretty tree printing. Private
keys, i.e. keys that starts with an underscore, are not
printed, counted when calling len nor iterated.
Examples
>>> tree=DictionaryTreeBrowser()>>> tree.set_item("Branch.Leaf1.color","green")>>> tree.set_item("Branch.Leaf2.color","brown")>>> tree.set_item("Branch.Leaf2.caterpillar",True)>>> tree.set_item("Branch.Leaf1.caterpillar",False)>>> tree└── Branch ├── Leaf1 │ ├── caterpillar = False │ └── color = green └── Leaf2 ├── caterpillar = True └── color = brown>>> tree.Branch├── Leaf1│ ├── caterpillar = False│ └── color = green└── Leaf2 ├── caterpillar = True └── color = brown>>> forlabel,leafintree.Branch:... print("%s is %s"%(label,leaf.color))Leaf1 is greenLeaf2 is brown>>> tree.Branch.Leaf2.caterpillarTrue>>> "Leaf1"intree.BranchTrue>>> "Leaf3"intree.BranchFalse>>>
Adds all the nodes in the given path if they don’t exist.
Parameters:
node_path: str
The nodes must be separated by full stops (periods).
Examples
>>> dict_browser=DictionaryTreeBrowser({})>>> dict_browser.add_node('First.Second')>>> dict_browser.First.Second=3>>> dict_browser└── First └── Second = 3
Given a path, return it’s value if it exists, or default value if
missing. May also perform a search whether an item key exists and then
returns the value or a list of values for multiple occurences of the
key – optionally returns the full path(s) in addition to its value(s).
The nodes of the path are separated using periods.
If True, the full path to the item has to be given. If
False, a search for the item key is performed (can include
additional nodes preceding they key separated by full stops).
Given a path, return True if it exists. May also perform a search
whether an item exists and optionally returns the full path instead of
boolean value.
The nodes of the path are separated using periods.
If True, the full path to the item has to be given. If
False, a search for the item key is performed (can include
additional nodes preceding they key separated by full stops).
Only applies if full_path=False. If False, a boolean
value is returned. If True, the full path to the item is returned,
a list of paths for multiple matches, or default value if it does
not exist.
default
The value to return for path if the item does not exist (default is None).
A dictionary, keys of which are used as attributes for exporting.
Key ‘self’ is only available with tag ‘id’, when the id of the
target is saved. The values are either None, or a tuple, where:
the first item a string, which containts flags, separated by
commas.
the second item is None if no ‘init’ flag is given, otherwise
the object required for the initialization.
The flag conventions are as follows:
‘init’: object used for initialization of the target. The object is
saved in the tuple in whitelist
‘fn’: the targeted attribute is a function, and may be pickled. A
tuple of (thing, value) will be exported to the dictionary,
where thing is None if function is passed as-is, and True if
cloudpickle package is used to pickle the function, with the value as
the result of the pickle.
‘id’: the id of the targeted attribute is exported (e.g. id(target.name))
‘sig’: The targeted attribute is a signal, and will be converted to a
dictionary if fullcopy=True
dict
A dictionary where the object will be exported
bool
Copies of objects are stored, not references. If any found,
functions will be pickled and signals converted to dictionaries
Peak pixel coordinates with with shape (n_peaks, 2).
Notes
Implemented as described in the PhD thesis of Thomas White, University of
Cambridge, 2009, with minor modifications to resolve ambiguities.
The algorithm is as follows:
Adjust the contrast and intensity bias of the image so that all pixels
have values between 0 and 1.
For each pixel, determine the mean and standard deviation of all pixels
inside a circle of radius 10 pixels centered on that pixel.
If the value of the pixel is greater than the mean of the pixels in the
circle by more than one standard deviation, set that pixel to have an
intensity of 1. Otherwise, set the intensity to 0.
Smooth the image by convovling it twice with a flat 3x3 kernel.
Let k = (1/2 - mu)/sigma where mu and sigma are the mean and standard
deviations of all the pixel intensities in the image.
For each pixel in the image, if the value of the pixel is greater than
mu + k*sigma set that pixel to have an intensity of 1. Otherwise, set the
intensity to 0.
Detect peaks in the image by locating the centers of gravity of regions
of adjacent pixels with a value of 1.
Repeat #4-7 until the number of peaks found in the previous step
converges to within the user defined convergence_ratio.
Find peaks in the cross correlation between the image and a template by
using the find_peaks_minmax() function
to find the peaks on the cross correlation result obtained using the
skimage.feature.match_template() function.
Implemented as described in Zaefferer “New developments of computer-aided
crystallographic analysis in transmission electron microscopy” J. Ap. Cryst.
This version by Ben Martineau (2016)
There are several places to obtain help with HyperSpy:
The HyperSpy Gitter chat is a good placed to go for both troubleshooting and general questions.
Issue with installation? There are some troubleshooting tips built into the installation page,
If you want to request new features or if you’re confident that you have found a bug, please
create a new issue on the HyperSpy GitHub issues page.
When reporting bugs, please try to replicate the bug with the HyperSpy sample data, and make every effort
to simplify your example script to only the elements necessary to replicate the bug.
plot_roi_map() doesn’t return the sum of all ROI maps (all_sum) and the signals sliced by the ROIs (roi_signals), these can be obtained separately using the rois returned by plot_roi_map() and interactive(). (#3364)
compute() will now pass keyword arguments to the dask dask.array.Array.compute() method. This enables setting the scheduler and the number of computational workers. (#2971)
Add functionality to select navigation position using shift + click in the navigator. (#3175)
Added a plot_residual to plot(). When True, a residual line (Signal - Model) appears in the model figure. (#3186)
Switch to matplotlib.axes.Axes.pcolormesh() for image plots involving non-uniform axes.
The following cases are covered: 2D-signal with arbitrary navigation-dimension, 1D-navigation and 1D-signal (linescan).
Not covered are 2D-navigation images (still uses sliders). (#3192)
New interpolate_on_axis() method to switch one axis of a signal. The data is interpolated in the process. (#3214)
Added plot_roi_map(). Allows interactively using a set of ROIs to select regions of the signal axes of a signal and visualise how the signal varies in this range spatially. (#3224)
Avoid slowing down fitting by optimising attribute access of model. (#3155)
Fix harmless error message when using multiple RectangularROI: check if resizer patches are drawn before removing them. Don’t display resizers when adding the widget to the figure (widget in unselected state) for consistency with unselected state (#3222)
Fix keeping dtype in rebin() when the endianess is specified in the dtype (#3237)
Fix serialization error due to traits.api.Property not being serializable if a dtype is specified.
See #3261 for more details. (#3262)
Fix setting bounds for "trf", "dogbox" optimizer (#3244)
Fix bugs in new marker implementation:
Markers str representation fails if the marker isn’t added to a signal
RosettaSciIO was split out of the HyperSpy repository on July 23, 2022. The IO-plugins and related functions so far developed in HyperSpy were moved to the RosettaSciIO repository. (#2972)
Extend the IO functions to accept alias names for format name as defined in RosettaSciIO. (#3009)
Fix behaviour of print_current_values(), print_current_values()
and print_known_signal_types(), which were not printing when running from a script - they were only printing when running in notebook or qtconsole. Now all print_* functions behave consistently: they all print the output instead of returning an object (string or html). The IPython.display.display() will pick a suitable rendering when running in an “ipython” context, for example notebook, qtconsole. (#3145)
The markers have been refactored - see the new markers API and the gallery of examples for usage. The new Markers uses matplotlib.collections.Collection, is faster and more generic than the previous implementation and also supports lazy markers. Markers saved in HyperSpy files (hspy, zspy) with HyperSpy < 2.0 are converted automatically when loading the file. (#3148)
For all functions with the rechunk parameter, the default has been changed from True to False. This means HyperSpy will not automatically try to change the chunking for lazy signals. The old behaviour could lead to a reduction in performance when working with large lazy datasets, for example 4D-STEM data. (#3166)
Replace the max_workers with the num_workers argument to be consistent with dask
Adds more documentation on setting the dask backend and how to use multiple cores
Adds navigation_chunk argument for setting the chunks with a non-lazy signal
Fix axes handling when the function to be mapped can be applied to the whole dataset - typically when it has the axis or axes keyword argument. (#3198)
Remove physics_tools since it is not used and doesn’t fit in the scope of HyperSpy. (#3235)
Improve the readability of the code by replacing the __call__ method of some objects with the more explicit _get_current_data.
Rename __call__ method of BaseSignal to _get_current_data.
As the HyperSpy API evolves, some of its parts are occasionally reorganized or removed.
When APIs evolve, the old API is deprecated and eventually removed in a major
release. The functions and methods removed in HyperSpy 2.0 are listed below along
with migration advises:
The API of the Polynomial has changed (it was deprecated in HyperSpy 1.5). The old API had a single parameters coefficients, which has been replaced by a0, a1, etc.
The creation of markers has changed to use their class name instead of aliases, for example,
use m=hs.plot.markers.Lines instead of m=hs.plot.markers.line_segment.
SVD along with the argument svd_solver=”randomized”
svd
SVD
fast_mlpca
MLPCA along with the argument svd_solver=”randomized
mlpca
MLPCA
nmf
NMF
RPCA_GoDec
RPCA
The argument learning_rate of the ORPCA algorithm has been renamed to subspace_learning_rate.
The argument momentum of the ORPCA algorithm has been renamed to subspace_momentum.
The list of possible values for the centre keyword argument of the decomposition() method
when using the SVD algorithm has been changed according to the following table:
For lazy signals, a possible value of the algorithm keyword argument of the
decomposition() method has been changed
from "ONMF" to "ORNMF".
Setting the metadata and original_metadata attribute of signals is removed, use
the set_item() and
add_dictionary() methods of the
metadata and original_metadata attribute instead.
The set_signal_type() now raises an error when passing
None to the signal_type argument. Use signal_type="" instead.
Passing an “iterating over navigation argument” to the map()
method is removed, pass a HyperSpy signal with suitable navigation and signal shape instead.
Add pooch as test dependency, as it is required to use scipy.dataset in latest scipy (1.10) and update plotting test. Fix warning when plotting non-uniform axis (#3079)
Fix matplotlib 3.7 and scikit-learn 1.4 deprecations (#3102)
Add a note in the user guide to explain that when a file contains several datasets, load() returns a list of signals instead of a single signal and that list indexation can be used to access a single signal. (#2975)
Fixes invalid file chunks when saving some signals to hspy/zspy formats. (#2940)
Fix issue where a TIFF image from an FEI FIB/SEM navigation camera image would not be read due to missing metadata (#2941)
Respect show_progressbar parameter in map() (#2946)
Fix regression in set_signal_range() which was raising an error when used interactively (#2948)
Fix SpanROI regression: the output of interactive() was not updated when the ROI was changed. Fix errors with updating limits when plotting empty slice of data. Improve docstrings and test coverage. (#2952)
Fix stacking signals that contain their variance in metadata. Previously it was raising an error when specifying the stacking axis. (#2954)
Fix missing API documentation of several signal classes. (#2957)
Add in zspy format: hspy specification with the zarr format. Particularly useful to speed up loading and saving large datasets by using concurrency. (#2825)
Fix bug in axes.UnitConversion: the offset value was initialized by units. (#2864)
Fix bug where the map() function wasn’t operating properly when an iterating signal was larger than the input signal. (#2878)
In case the Bruker defined XML element node at SpectrumRegion contains no information on the
specific selected X-ray line (if there is only single line available), suppose it is ‘Ka’ line. (#2881)
When loading Bruker Bcf, cutoff_at_kV=None does no cutoff (#2898)
Fix bug where the map() function wasn’t operating properly when an iterating signal was not an array. (#2903)
Fix bug for not saving ragged arrays with dimensions larger than 2 in the ragged dimension. (#2906)
Fix bug with importing some spectra from eelsdb and add progress bar (#2916)
Fix bug when the spikes_removal_tool would not work interactively for signal with 0-dimension navigation space. (#2918)
Document reading Attolight data with the sur/pro format reader (#2559)
Lazy signals now caches the current data chunk when using multifit and when plotting, improving performance. (#2568)
Read cathodoluminescence metadata from digital micrograph files, amended in PR #2894 (#2590)
Add possibility to search/access nested items in DictionaryTreeBrowser (metadata) without providing full path to item. (#2633)
Improve map() function in BaseSignal by utilizing dask for both lazy and non-lazy signals. This includes adding a lazy_output parameter, meaning non-lazy signals now can output lazy results. See the user guide for more information. (#2703)
NeXus file with more options when reading and writing (#2725)
Add Github workflow to run test suite of extension from a pull request. (#2824)
Add ragged attribute to BaseSignal to clarify when a signal contains a ragged array. Fix inconsistency caused by ragged array and add a ragged array section to the user guide (#2842)
Import hyperspy submodules lazily to speed up importing hyperspy. Fix autocompletion signals submodule (#2850)
Add new markers hyperspy.drawing._markers.arrow, hyperspy.drawing._markers.ellipse and filled hyperspy.drawing._markers.rectangle. (#2871)
Add metadata about the file-reading and saving operations to the Signals
produced by load() and save()
(see the metadata structure section of the user guide) (#2873)
expose Stage coordinates and rotation angle in metada for sem images in bcf reader. (#2911)
metadata.Signal.binned is replaced by an axis parameter, e. g. axes_manager[-1].is_binned (#2652)
when loading Bruker bcf, cutoff_at_kV=None (default) applies no more automatic cutoff.
New acceptable values "zealous" and "auto" do automatic cutoff. (#2910)
Deprecate the ability to directly set metadata and original_metadata Signal
attributes in favor of using set_item()
and add_dictionary() methods or
specifying metadata when creating signals (#2913)
When saving a dataset with a dtype other than
uint8 to a blockfile
(blo) it is now possible to provide the argument intensity_scaling to map
the intensity values to the reduced range (#2774)
Use importlib_metadata instead of pkg_resources for extensions
registration to speed up the import process and making it possible to install
extensions and use them without restarting the python session (#2709)
Don’t import hyperspy extensions when registering extensions (#2711)
Improve docstrings of various fitting methods (#2724)
Improve performance issue with the map method of lazy signal (#2617)
Add option to copy/load original metadata in hs.stack and hs.load to avoid large original_metadata which can slowdown processing. Close #1398, #2045, #2536 and #1568. (#2691)
This is a maintenance release that adds compatibility with h5py 3.0 and includes
numerous bug fixes and enhancements.
See the issue tracker
for details.
Improve thread-based parallelism. Add max_workers argument to the
map() method, such that the user can directly
control how many threads they launch.
Many improvements to the decomposition() and
blind_source_separation() methods, including support for
scikit-learn like algorithms, better API and much improved documentation.
See Machine learning and the API changes section below.
Add option to calculate the absolute thickness to the EELS
hyperspy._signals.eels.EELSSpectrum.estimate_thickness method.
See Thickness estimation.
Vastly improved performance and memory footprint of the
estimate_shift2D() method.
The remove_background() method can
now remove Doniach, exponential, Lorentzian, skew normal,
split Voigt and Voigt functions. Furthermore, it can return the background
model that includes an estimation of the reduced chi-squared.
The performance of the maximum-likelihood PCA method was greatly improved.
All ROIs now have a __getitem__ method, enabling e.g. using them with the
unpack * operator. See Slicing using ROIs for an example.
New syntax to set the contrast when plotting images. In particular, the
vmin and vmax keywords now take values like vmin="30th" to
clip the minimum value to the 30th percentile. See Fast Fourier Transform (FFT)
for an example.
This is a maintenance release that adds compatibility with Numpy 1.17 and Dask
2.3.0 and fixes a bug in the Bruker reader. See the issue tracker
for details.
The following components have been rewritten using
hyperspy._components.expression.Expression, boosting their
speeds among other benefits. Multiple issues have been fixed on the way.
The hyperspy._components.polynomial_deprecated.Polynomial
component will be deprecated in HyperSpy 2.0 in favour of the new
hyperspy._components.polynomial.Polynomial component, that is based on
hyperspy._components.expression.Expression and has an improved API. To
start using the new component pass the legacy=False keyword to the
the hyperspy._components.polynomial_deprecated.Polynomial component
constructor.
This is a maintenance release. Among many other fixes and enhancements, this
release fixes compatibility issues with Matplotlib v 3.1. Follow the
following links for details on all the bugs fixed
and enhancements.
Reading and writing the mrcz open format. See MRCZ format.
New hyperspy.datasets.artificial_data module which contains functions for generating
artificial data, for use in things like docstrings or for people to test
HyperSpy functionalities. See Loading example data and data from online databases.
The cmap option of plot_images()
supports iterable types, allowing the user to specify different colormaps
for the different images that are plotted by providing a list or other
generator.
Clicking on an individual image updates it.
New customizable keyboard shortcuts to navigate multi-dimensional datasets. See Data visualization.
The remove_background() method now operates much faster
in multi-dimensional datasets and adds the options to interatively plot the remainder of the operation and
to set the removed background to zero. See Background removal for details.
The plot() method now takes a norm keyword that can be “linear”, “log”,
“auto” or a matplotlib norm. See Customising image plot for details.
Moreover, there are three new extra keyword
arguments, fft_shift and power_spectrum, that are useful when plotting fourier transforms. See
Fast Fourier Transform (FFT).
This is a maintenance release. Follow the following links for details on all
the bugs fixed
and enhancements.
Starting with this version, the HyperSpy WinPython Bundle distribution is
no longer released in sync with HyperSpy. For HyperSpy WinPython Bundle
releases see hyperspy/hyperspy-bundle
The vmin and vmax arguments of the
plot_images() function now accept lists to enable
setting these parameters for each plot individually.
The plot_decomposition_results() and
plot_bss_results() methods now makes a better
guess of the number of navigators (if any) required to visualise the
components. (Previously they were always plotting four figures by default.)
All functions that take a signal range can now take a SpanROI.
The following ROIs can now be used for indexing or slicing (see here for details):
Permanent markers (if any) are now displayed when plotting by default.
HyperSpy no longer depends on traitsui (fixing many installation issues) and
ipywidgets as the GUI elements based on these packages have now been splitted
into separate packages and are not installed by default.
The following methods now raise a ValueError when not providing the
number of components if output_dimension was not specified when
performing a decomposition. (Previously they would plot as many figures
as available components, usually resulting in memory saturation):
In addition to adding ipywidgets GUI elements, the traitsui GUI elements have
been splitted into a separate package. See the new
hyperspy_gui_traitsui
package.
The new hyperspy.ui_registry enables easy connection of external
GUI elements to HyperSpy. This is the mechanism used to split the traitsui
and ipywidgets GUI elements.
Markers can now be saved to hdf5 and creating many markers is easier and
faster. See Markers.
Add option to save to HDF5 file using the “.hspy” extension instead of
“.hdf5”. See HSpy - HyperSpy’s HDF5 Specification. This will be the default extension in
HyperSpy 1.3.
New metadata added to the HyperSpy metadata specifications: magnification,
frame_number, camera_length, authors, doi, notes and
quantity. See Metadata structure for details.
The y-axis label (for 1D signals) and colorbar label (for 2D signals)
are now taken from the new metadata.Signal.quantity.
The time and date metadata are now stored in the ISO 8601 format.
All metadata in the HyperSpy metadata specification is now read from all
supported file formats when available.
The new BaseSignal,
Signal1D and
Signal2D deprecate hyperspy.signal.Signal,
Signal1D and Signal2D
respectively. Also as_signal1D, as_signal2D`, to_signal1D and to_signal2D
deprecate as_signal1D, as_signal2D, to_spectrum and to_image. See #963 and #943 for details.
This is a maintenance release that includes fixes for multiple bugs, some
enhancements, new features and API changes. This is set to be the last HyperSpy
release for Python 2. The release (HyperSpy 0.8.4) will support only Python 3.
Importantly, the way to start HyperSpy changes (again) in this release. Please
read carefully Starting Python in Windows for details.
The broadcasting rules have also changed. See Signal operations
for details.
In HyperSpy 0.8.1 the full content of hyperspy.hspy is still
imported in the user namespace, but this can now be disabled in
hs.preferences.General.import_hspy. In Hyperspy 1.0 it will be
disabled by default and the hyperspy.hspy module will be fully
removed in HyperSpy 0.10. We encourage all users to migrate to the new
syntax. For more details see Starting Python in Windows.
Indexing the hyperspy.signal.Signal class is now deprecated. We encourage
all users to use isig and inav instead for indexing.
hyperspy.hspy.create_model is now deprecated in favour of the new
equivalent hyperspy.signal.Signal.create_modelSignal method.
hyperspy.signal.Signal.unfold_if_multidim is deprecated.
New method for quantifying EDS TEM spectra using Cliff-Lorimer method, hyperspy._signals.eds_tem.EDSTEMSpectrum.quantification. See EDS Quantification.
New method to estimate for background subtraction, hyperspy._signals.eds.EDSSpectrum.estimate_background_windows. See Background subtraction.
New method to estimate the windows of integration, hyperspy._signals.eds.EDSSpectrum.estimate_integration_windows.
New specific hyperspy._signals.eds.EDSSpectrum.plot method, with markers to indicate the X-ray lines, the window of integration or/and the windows for background subtraction. See Plotting X-ray lines.
New examples of signal in the hspy.utils.example_signals module.
New method to mask the vaccum, hyperspy._signals.eds_tem.EDSTEMSpectrum.vacuum_mask and a specific hyperspy._signals.eds_tem.EDSTEMSpectrum.decomposition method that incoroporate the vacuum mask
New Signal methods to transform between Signal subclasses. More information
here.
hyperspy.signal.Signal.set_signal_type
hyperspy.signal.Signal.set_signal_origin
hyperspy.signal.Signal.as_signal2D
hyperspy.signal.Signal.as_signal1D
The string representation of the Signal class now prints the shape of the
data and includes a separator between the navigation and the signal axes e.g
(100, 10| 5) for a signal with two navigation axes of size 100 and 10 and one
signal axis of size 5.
The default toolkit can now be saved in the preferences.
Added full compatibility with the Qt toolkit that is now the default.
Added compatibility witn the the GTK and TK toolkits, although with no GUI
features.
It is now possible to run HyperSpy in a headless system.
Added a CLI to hyperspy.signal.Signal1DTools.remove_background.
New hyperspy.signal.Signal1DTools.estimate_peak_width method to estimate
peak width.
New methods to integrate over one axis:
hyperspy.signal.Signal.integrate1D and
hyperspy.signal.Signal1DTools.integrate_in_range.
New hyperspy.signal.Signal.metadata attribute, Signal.binned. Several
methods behave differently on binned and unbinned signals.
See Binned and unbinned signals.
New hyperspy.signal.Signal.map method to easily transform the
data using a function that operates on individual signals. See
Iterating over the navigation axes.
New hyperspy.signal.Signal.get_histogram and
hyperspy.signal.Signal.print_summary_statistics methods.
The spikes removal tool has been moved to the Signal1D
class so that it is available for all its subclasses.
The hyperspy.signal.Signal.split` method now can automatically split back
stacked signals into its original part. See Splitting and stacking.
New method,
hyperspy._signals.eels.EELSSpectrum.kramers_kronig_analysis to calculate
the dielectric function from low-loss electron energy-loss spectra based on
the Kramers-Kronig relations. See Kramers-Kronig Analysis.
New method to align the zero-loss peak,
hyperspy._signals.eels.EELSSpectrum.align_zero_loss_peak.
New signal, EDSSpectrum especialized in EDS data analysis, with subsignal
for EDS with SEM and with TEM: EDSSEMSpectrum and EDSTEMSpectrum. See
Energy-Dispersive X-ray Spectrometry (EDS).
New database of EDS lines available in the elements attribute of the
hspy.utils.material module.
Specific methods to describe the sample,
hyperspy._signals.eds.EDSSpectrum.add_elements and
hyperspy._signals.eds.EDSSpectrum.add_lines. See Describing the sample
New method to get the intensity of specific X-ray lines:
hyperspy._signals.eds.EDSSpectrum.get_lines_intensity. See
Describing the sample
hyperspy.misc has been reorganized. Most of the functions in misc.utils has
been rellocated to specialized modules. misc.utils is no longer imported in
hyperspy.hspy. A new hyperspy.utils module is imported instead.
Signal now supports indexing and slicing. See Indexing.
Most arithmetic and rich arithmetic operators work with signal.
See Signal operations.
Much improved EELSSpectrum methods:
hyperspy._signals.eels.EELSSpectrum.estimate_zero_loss_peak_centre,
hyperspy._signals.eels.EELSSpectrum.estimate_elastic_scattering_intensity and
hyperspy._signals.eels.EELSSpectrum.estimate_elastic_scattering_threshold.
The axes can now be given using their name e.g. s.crop("x",1,10)
New syntax to specify position over axes: an integer specifies the indexes
over the axis and a floating number specifies the position in the axis units
e.g. s.crop("x",1,10.) crops over the axis x (in meters) from index 1
to value 10 meters. Note that this may make your old scripts behave in
unexpected ways as just renaming the old *_in_units and *_in_values methods
won’t work in most cases.
Most methods now use the natural order i.e. X,Y,Z.. to index the axes.
Add padding to fourier-log and fourier-ratio deconvolution to fix the
wrap-around problem and increase its performance.
New
hyperspy.components.eels_cl_edge.EELSCLEdge.get_fine_structure_as_spectrum
EELSCLEdge method.
New hyperspy.components.arctan.Arctan model component.
New
hyperspy.model.Model.enable_adjust_position
and hyperspy.model.Model.disable_adjust_position
to easily change the position of components using the mouse on the plot.
New Model methods
hyperspy.model.Model.set_parameters_value,
hyperspy.model.Model.set_parameters_free and
hyperspy.model.Model.set_parameters_not_free
to easily set several important component attributes of a list of components
at once.
New Signal methods:
hyperspy.signal.Signal.integrate_simpson,
hyperspy.signal.Signal.max,
hyperspy.signal.Signal.min,
hyperspy.signal.Signal.var, and
hyperspy.signal.Signal.std.
New sliders window to easily navigate signals with navigation_dimension > 2.
The Ripple (rpl) reader can now read rpl files produced by INCA.
Change syntax to create Signal objects. Instead of a dictionary
Signal.__init__ takes keywords e.g with a new syntax .
>>>s=signals.Signal1D(np.arange(10)) instead of
>>>s=signals.Signal1D({'data':np.arange(10)})
The documentation was thoroughly revised, courtesy of M. Walls.
New user interface to remove spikes from EELS spectra.
New align2D signals.Signal2D method to align image stacks.
When loading image files, the data are now automatically converted to
grayscale when all the color channels are equal.
Add the possibility to load a stack memory mapped (similar to ImageJ
virtual stack).
Improved hyperspy starter script that now includes the possibility
to start HyperSpy in the new IPython notebook.
Add “HyperSpy notebook here” to the Windows context menu.
The information displayed in the plots produced by Signal.plot have
been enhanced.
Added Egerton’s sigmak3 and sigmal3 GOS calculations (translated
from matlab by I. Iyengar) to the EELS core loss component.
A browsable dictionary containing the chemical elements and
their onset energies is now available in the user namespace under
the variable name elements.
The ripple file format now supports storing the beam energy, the collection and the convergence angle.
The EELS core loss component had a bug in the calculation of the
relativistic gamma that produced a gamma that was always
approximately zero. As a consequence the GOS calculation was wrong,
especially for high beam energies.
Loading msa files was broken when running on Python 2.7.2 and newer.
Saving images to rpl format was broken.
Performing BSS on data decomposed with poissonian noise normalization
was failing when some columns or rows of the unfolded data were zero,
what occurs often in EDX data for example.
Importing some versions of scikits learn was broken
The progress bar was not working properly in the new IPython notebook.
The constrast of the image was not automatically updated.
Signal1D and Signal2D are not loaded into the user namespace by default.
The signals module is loaded instead.
Change the default BSS algorithm to sklearn fastica, that is now
distributed with HyperSpy and used in case that sklearn is not
installed e.g. when using EPDFree.
_slicing_axes was renamed to signal_axes.
_non_slicing_axes to navigation_axes.
All the Model *_in_pixels methods were renamed to to _*_in_pixel.
EELSCLEdge.fs_state was renamed to fine_structure_active.
EELSCLEdge.fslist was renamed to fine_structure_coeff.
EELSCLEdge.fs_emax was renamed to fine_structure_width.
EELSCLEdge.freedelta was renamed to free_energy_shift.
EELSCLEdge.delta was renamed to energy_shift.
A value of True in a mask now means that the item is masked all over
HyperSpy.
In EELS automatic background feature creates a PowerLaw component, adds it to the model an add it to a variable in the user namespace. The variable has been renamed from bg to background.
pes_gaussian Component renamed to pes_core_line_shape.
This guide is intended to give people who want to start contributing
to HyperSpy a foothold to kick-start the process.
We anticipate that many potential contributors and developers will be
scientists who may have a lot to offer in terms of expert knowledge but may
have little experience when it comes to working on a reasonably large
open-source project like HyperSpy. This guide is aimed at you – helping to
reduce the barrier to make a contribution.
Probably you would not be interested in contributing to HyperSpy, if you were
not already a user, but, just in case: the best way to start understanding how
HyperSpy works and to build a broad overview of the code as it stands is to
use it – so what are you waiting for? Install HyperSpy!
The HyperSpy User Guide also provides a good overview
of all the parts of the code that are currently implemented as well as much
information about how everything
works – so read it well.
Open source projects are all about community – we put in much effort to make
good tools available to all and most people are happy to help others start out.
Everyone had to start at some point and the philosophy of these projects
centres around the fact that we can do better by working together.
Much of the conversation happens in ‘public’ via online platforms. The main two
forums used by HyperSpy developers are:
Gitter – where we host a live
chat-room in which people can ask questions and discuss things in a relatively
informal way.
Github – the main repository
for the source code also enables issues to be raised in a way that means
they’re logged until dealt with. This is also a good place to make a proposal
for some new feature or tool that you want to work on.
You don’t need to be a professional programmer to contribute to HyperSpy.
Indeed, there are many ways to contribute:
Just by asking a question in our
Gitter chat room
instead of sending a private email to the developers you are contributing to
HyperSpy. Once you get more familiar with HyperSpy, it will be awesome if
you could help others with their questions.
Issues reported in the
issues tracker
are precious contributions.
Pull request reviews are
essential for the sustainability of open development software projects
and HyperSpy is no exception. Therefore, reviews are highly appreciated.
While you may need a good familiarity with
the HyperSpy code base to review complex contributions,
you can start by reviewing simpler ones such as documentation
contributions or simple bug fixes.
Last but not least, you can contribute code in the form of
documentation, bug fixes, enhancements or new features. That is the main
topic of the rest of this guide.
You may have a very clear idea of what you want to contribute, but if you’re
not sure where to start, you can always look through the issues and pull
requests on the GitHub Page.
You’ll find that there are many known areas for development in the issues
and a number of pull-requests are partially finished projects just sitting
there waiting for a keen new contributor to come and learn by finishing.
The documentation (let it be the docstrings,
guides or the website) is always in need of some care. Besides,
contributing to HyperSpy’s documentation is a very good way to get
familiar with GitHub.
When you’ve decided what you’re going to work on – let people know using the
online forums! It may be that someone else is doing something similar and
can help.; it is
also good to make sure that those working on related projects are pulling in
the same direction.
There are 3 key points to get right when starting out as a contributor:
Work out what you want to contribute and break it down in to manageable
chunks. Use Git branches to keep work separated
in manageable sections.
The IO plugins formerly developed within HyperSpy have now been moved to
the separate RosettaSciIO repository
in order to facilitate a wider use also by other packages. Plugins supporting
additional formats or corrections/enhancements to existing plugins should now
be contributed to the RosettaSciIO repository
and file format specific issues should be reported to the RosettaSciIO issue
tracker.
For developing the code, the home of HyperSpy is on
GitHub, and you’ll see that
a lot of this guide boils down to properly using that platform. So, visit the
following link and poke around the code, issues, and pull requests: HyperSpy
on GitHub.
It is probably also worth to visit github.com
and to go through the “boot camp” to get a feel for the terminology.
In brief, to give you a hint on the terminology to search for and get
accustomed to, the contribution pattern is:
Setup git/github, if you don’t have it yet.
Fork HyperSpy on GitHub.
Checkout your fork on your local machine.
Create a new branch locally, where you will make your changes.
Push the local changes to your own HyperSpy fork on GitHub.
Create a pull request (PR) to the official HyperSpy repository.
Note
You cannot mess up the main HyperSpy project unless you have been
promoted to write access and the dev-team. So when you’re starting out be
confident to play, get it wrong, and if it all goes wrong, you can always get
a fresh install of HyperSpy!!
PS: If you choose to develop in Windows/Mac you may find Github Desktop useful.
By now, you will have had a look around GitHub – but why is it so important?
Well, GitHub is the public forum in which we manage and discuss development of
the code. More importantly, it enables every developer to use Git, which is
an open source “version control” system. By version control, we mean that you
can separate out your contribution to the code into many versions (called
branches) and switch between them easily. Later, you can choose which version
you want to have integrated into HyperSpy. You can learn all about Git at
git-scm!
It is very important to separate your contributions, so
that each branch is a small advancement on the “master” code or on another
branch. In the end, each branch will have to be checked and reviewed by
someone else before it can be included – so if it is too big, you will be
asked to split it up!
For personal use, before integrating things into the main HyperSpy code, you
can merge some branches for your personal use. However, make sure each new
feature has its own branch that is contributed through a separate pull
request!
Diagrammatically, you should be aiming for something like this:
HyperSpy versioning follows semantic versioning
and the version number is therefore a three-part number: MAJOR.MINOR.PATCH.
Each number will change depending on the type of changes according to the following:
MAJOR increases when making incompatible API changes,
MINOR increases when adding functionality in a backwards compatible manner, and
PATCH increases when making backwards compatible bug fixes.
The git repository of HyperSpy has 3 main branches matching the above pattern
and depending on the type of pull request, you will need to base your pull request
on one of the following branch:
RELEASE_next_major to change the API in a not backward-compatible fashion,
RELEASE_next_minor to add new features and improvement,
RELEASE_next_patch for bug fixes.
The RELEASE_next_patch branch is merged daily into RELEASE_next_minor by the github action
Nightly Merge.
If you started your work in the wrong branch (typically on RELEASE_next_minor
instead of RELEASE_next_patch and you are doing a bug fix), you can change the
base branch using gitrebase--onto, like this:
For review, and for revisiting changes at a later point, it is advisable to keep a “clean” git history, i.e. a meaningful succession of commits. In some cases, it is useful to rewrite the git history to keep it more readable:
it is not always possible to keep a clean history and quite often the code development follows an exploratory process with code changes going back and forth, etc.
Commits that only fix typographic mistakes, formatting or failing tests usually can be squashed (merged) into the previous commits.
When using a GUI for interaction with git, check out its features for joining and reordering commits.
When using git in the command line, use gitrebase with the interactive option. For example, to rearrange the last five commits:
$gitrebase-iHEAD~5
In a text editor, you can then edit the commit history. If you have commits a...e and want to merge b and e into a and d, respectively, while moving c to the end of the hisotry, your file would look the following:
pick a ...
squash b ...
pick d ...
squash e ...
pick c ...
Afterwards, you get a chance to edit the commit messages.
Finally, to push the changes, use a + in front of the branch name, to override commits you have already pushed to github previously:
Every new function that is written in to HyperSpy should be tested and
documented. HyperSpy uses the pytest library
for testing. The tests reside in the hyperspy.tests module.
Tests are short functions, found in hyperspy/tests, that call your functions
under some known conditions and check the outputs against known values. They
should depend on as few other features as possible so that when they break,
we know exactly what caused it. Ideally, the tests should be written at the
same time as the code itself, as they are very convenient to run to check
outputs when coding. Writing tests can seem laborious but you’ll probably
soon find that they are very important, as they force you to sanity-check
all the work that you do.
The hyperspy.misc.test_utils.py contains a few useful functions for
testing
@pytest.mark.parametrize() is a very convenient decorator to test several
parameters of the same function without having to write to much repetitive
code, which is often error-prone. See pytest documentation for more details.
It is good to check that the tests do not use too much memory after
creating new tests. If you need to explicitly delete your objects and free
memory, you can do the following to release the memory associated with the
s object:
>>> importgc>>> # Do some work with the object>>> s=...>>> # Clear the memory>>> dels>>> gc.collect()
First ensure pytest and its plugins are installed by:
# If using a standard hyperspy install
$pipinstallhyperspy[tests]# Or, from a hyperspy local development directory
$pipinstall-e.[tests]# Or just installing the dependencies using conda
$condainstall-cconda-forgepytestpytest-mpl
To run them:
$pytest--mpl--pyargshyperspy
Or, from HyperSpy’s project folder, simply:
$pytest
Note
pytest configuration options are set in the setup.cfg file, under the
[tool:pytest] section. See the pytest configuration documentation for more details.
The HyperSpy test suite can also be run in parallel if you have multiple CPUs
available, using the pytest-xdist plugin.
If you have the plugin installed, HyperSpy will automatically run the test suite in
parallel on your machine.
# To run on all the cores of your machine
$pytest-nauto--distloadfile
# To run on 2 cores
$pytest-n2--distloadfile
The --distloadfile argument will group tests by their containing file. The
groups are then distributed to available workers as whole units, thus guaranteeing
that all tests in a file run in the same worker.
Note
Running tests in parallel using pytest-xdist will change the content
and format of the output of pytest to the console. We recommend installing
pytest-sugar to produce
nicer-looking output including an animated progressbar.
To test docstring examples, assuming the current location is the HyperSpy root
directory:
# All
$pytest--doctest-modules--ignore-glob=hyperspy/tests--pyargshyperspy
# In a single file, like the signal.py file
$pytest--doctest-moduleshyperspy/signal.py
Test functions can sometimes exhibit intermittent or sporadic failure, with seemingly
random or non-deterministic behaviour. They may sometimes pass or sometimes fail, and
it won’t always be clear why. These are usually known as “flaky” tests.
One way to approach flaky tests is to rerun them, to see if the failure was a one-off.
This can be achieved using the pytest-rerunfailures plugin.
# To re-run all test suite failures a maximum of 3 times
$pytest--reruns3# To wait 1 second before the next retry
$pytest--reruns3--reruns-delay1
Once you have pushed your pull request to the official HyperSpy repository,
you can see the coverage of your tests using the
codecov.io check for
your PR. There should be a link to it at the bottom of your PR on the Github
PR page. This service can help you to find how well your code is being tested
and exactly which parts are not currently tested.
You can also measure code coverage locally. If you have installed pytest-cov,
you can run (from HyperSpy’s project folder):
$pytest--cov=hyperspy
Configuration options for code coverage are also set in the setup.cfg file,
under the [coverage:run] and [coverage:report] sections. See the coverage
documentation
for more details.
Note
The codecov.io check in your
PR will fail if it either decreases the overall test coverage of HyperSpy,
or if any of the lines introduced in your diff are not covered.
The HyperSpy test suite is run using continuous integration services provided by
Github Actions and
Azure Pipelines.
In case of Azure Pipelines, CI helper scripts are pulled from the
ci-scripts repository.
The testing matrix is as follows:
Github Actions: test a range of Python versions on Linux, MacOS and Windows;
all dependencies are installed from PyPI.
See .github/workflows/tests.yml in the HyperSpy repository for further details.
Azure Pipeline: test a range of Python versions on Linux, MacOS and Windows;
all dependencies are installed from Anaconda Cloud
using the “conda-forge” channel.
See azure-pipelines.yml in the HyperSpy repository for further details.
This testing matrix has been designed to be simple and easy to maintain, whilst
ensuring that packages from PyPI and Anaconda cloud are not mixed in order to
avoid red herring failures of the test suite caused by application binary
interface (ABI) incompatibility between dependencies.
The most recent versions of packages are usually available first on PyPI, before
they are available on Anaconda Cloud. These means that if a recent release of a
dependency breaks the test suite, it should happen first on Github Actions.
Similarly, deprecation warnings will usually appear first on Github Actions.
The documentation build is done on both Github Actions and
Read the Docs, and it is worth checking that no new
warnings have been introduced when writing documentation in the user guide or
in the docstrings.
The Github Actions testing matrix also includes the following special cases:
The test suite is run against HyperSpy’s minimum requirements on Python 3.7
on Linux. This will skip any tests that require optional packages such as
scikit-learn.
The test suite is run against the oldest supported versions of numpy,
matplotlib and scipy. For more details, see this
Github issue.
The test suite is run against the development supported versions of numpy,
scipy, scikit-learn and scikit-image using the weekly build wheels
available on https://anaconda.org/scipy-wheels-nightly. For more details, see
this Github issue.
Plotting is tested using the @pytest.mark.mpl_image_compare decorator of
the pytest mpl plugin. This
decorator uses reference images to compare with the generated output during the
tests. The reference images are located in the folder defined by the argument
baseline_dir of the @pytest.mark.mpl_image_compare decorator.
To run plot tests, you simply need to add the option --mpl:
$pytest--mpl
If you don’t use --mpl, the test functions will be executed, but the
images will not be compared to the reference images.
If you need to add or change some plots, follow the workflow below:
Write the tests using appropriate decorators such as
@pytest.mark.mpl_image_compare.
If you need to generate a new reference image in the folder
plot_test_dir, for example, run: pytest--mpl-generate-path=plot_test_dir
Run again the tests and this time they should pass.
Use gitadd to put the new file in the git repository.
When the plotting tests fail, it is possible to download the figure
comparison images generated by pytest-mpl in the artifacts tabs of the
corresponding build on Azure Pipeliness:
Note
To generate the baseline images, the version of matplotlib defined in
conda_environment_dev.yml
is required.
The plotting tests are tested on Azure Pipelines against a specific version of
matplotlib defined in conda_environment_dev.yml.
This is because small changes in the way matplotlib generates the figure between
versions can sometimes make the tests fail.
For plotting tests, the matplotlib backend is set to agg by setting
the MPLBACKEND environment variable to agg. At the first import of
matplotlib.pyplot, matplotlib will look at the MPLBACKEND environment
variable and accordingly set the backend.
Written at the start of a function, they give essential information
about how it should be used, such as which arguments can be passed to it and
what the syntax should be. The docstrings need to follow the numpy
specification,
as shown in this example.
As a general rule, any code that is part of the public API (i.e. any function
or class that an end-user might access) should have a clear and comprehensive
docstring explaining how to use it. Private methods that are never intended to
be exposed to the end-user (usually a function or class starting with an underscore)
should still be documented to the extent that future developers can understand
what the function does.
To test code of “examples” section in the docstring, run:
pytest--doctest-modules--ignore=hyperspy/tests
You can check whether your docstrings follow the convention by using the
flake8-docstringsextension,
like this:
# If not already installed, you need flake8 and flake8-docstrings
pipinstallflake8flake8-docstrings
# Run flake8 on your file
flake8/path/to/your/file.py
# Example output
/path/to/your/file.py:46:1:D103Missingdocstringinpublicfunction
/path/to/your/file.py:59:1:D2051blanklinerequiredbetweensummarylineanddescription
A description of the functionality of the code and
how to use it with examples and links to the relevant code.
When writing both the docstrings and user guide documentation, it is useful to
have some data which the users can use themselves. Artificial
datasets for this purpose can be found in data.
Example codes in the user guide can be tested using
doctest:
To check the output of what you wrote, you can build
the documentation by running the make command in the hyperspy/doc
directory. For example makehtml will build the whole documentation in
html format. See the make command documentation for more details.
To install the documentation dependencies, run either
$condainstallhyperspy-dev
or
$pipinstallhyperspy[doc]
When writing documentation, the Python package sphobjinv can be useful for writing
cross-references. For example, to find how to write a cross-reference to
set_signal_type(), use:
Builds of the documentation for each minor and major release are hosted in the hyperspy/hyperspy-doc
repository and are used by the version switcher
of the documentation.
The "dev" version is updated automatically when pushing on the RELEASE_next_minor branch and the “current” (stable)
version is updated automatically when a tag is pushed.
When releasing a minor and major release, two manual steps are required:
in hyperspy/hyperspy-doc, copy the “current” stable documentation to a separate folder named with the corresponding version
update the documentation version switch, in doc/_static/switcher.json:
copy and paste the “current”` documentation entry
update the version in the “current” entry to match the version to be released, e.g. increment the minor or major digit
in the newly created entry, update the link to the folder created in step 1.
HyperSpy follows the Style Guide for Python Code - these are rules
for code consistency that you can read all about in the Python Style Guide. You can use the
black or ruff code formatter to automatically
fix the style of your code using pre-commit hooks.
Linting error can be suppressed in the code using the #noqa marker,
more information in the ruff documentation.
Code linting and formatting is checked continuously using ruff pre-commit hooks.
These can be run locally by using pre-commit.
Alternatively, the comment pre-commit.ciautofix can be added to a PR to fix the formatting
using pre-commit.ci.
MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes
This means that as little, ideally no, functionality should break between minor releases.
Deprecation warnings are raised whenever possible and feasible for functions/methods/properties/arguments,
so that users get a heads-up one (minor) release before something is removed or changes, with a possible
alternative to be used.
A deprecation decorator should be placed right above the object signature to be deprecated:
This will update the docstring, and print a visible deprecation warning telling
the user to use the alternative function or argument.
These deprecation wrappers are inspired by those in kikuchipy.
Tips for writing methods that work on lazy signals#
With the addition of the LazySignal class and its derivatives, adding
methods that operate on the data becomes slightly more complicated. However, we
have attempted to streamline it as much as possible. LazySignals use
dask.array.Array for the data field instead of the usual
numpy.ndarray. The full documentation is available
here. While interfaces of
the two arrays are indeed almost identical, the most important differences are
(da being dask.array.Array in the examples):
Dask arrays are immutable: da[3]=2 does not work. da+=2
does, but it’s actually a new object – you might as well use da=da+2
for a better distinction.
Unknown shapes are problematic: res=da[da>0.3] works, but the
shape of the result depends on the values and cannot be inferred without
execution. Hence, few operations can be run on res lazily, and it should
be avoided if possible.
Computations in Dask are Lazy: Dask only preforms a computation when it has to. For example
the sum function isn’t run until compute is called. This also means that some function can be
applied to only some portion of the data.
The easiest way to add new methods that work both with arbitrary navigation
dimensions and LazySignals is by using the map method to map your function func across
all “navigation pixels” (e.g. spectra in a spectrum-image). map methods
will run the function on all pixels efficiently and put the results back in the
correct order. func is not constrained by dask and can use whatever
code (assignment, etc.) you wish.
The map function is flexible and should be able to handle most operations that
operate on some signal. If you add a BaseSignal with the same navigation size
as the signal, it will be iterated alongside the mapped signal, otherwise a keyword
argument is assumed to be constant and is applied to every signal.
If the new method cannot be coerced into a shape suitable for map, separate
cases for lazy signals will have to be written. If a function operates on
arbitrary-sized arrays and the shape of the output can be known before calling,
da.map_blocks and da.map_overlap are efficient and flexible.
Finally, in addition to _iterate_signal that is available to all HyperSpy
signals, lazy counterparts also have the _block_iterator method that
supports signal and navigation masking and yields (returns on subsequent calls)
the underlying dask blocks as numpy arrays. It is important to note that
stacking all (flat) blocks and reshaping the result into the initial data shape
will not result in identical arrays. For illustration it is best to see the
dask documentation.
Interactive plotting in hyperspy is handled through matplotlib and is primarily driven though
event handling.
Specifically, for some signal s, when the index value for some BaseDataAxis
is changed, then the signal plot is updated to reflect the data at that index. Each signal has a
_get_current_data function, which will return the data at the current navigation index.
For lazy signals, the _get_current_data function works slightly differently as the current chunk is cached. As a result,
the _get_current_data function first checks if the current chunk is cached and then either computes the chunk where the
navigation index resides or just pulls the value from the cached chunk.
Python is not the fastest language, but this is not usually an issue because
most scientific Python software uses libraries written in compiled languages
such as Numpy for data processing, hence running at close to C-speed.
Nevertheless, sometimes it is necessary to improve the speed of some parts of
the code by writing some functions
in compiled languages or by using Just-in-time (JIT) compilation. Before taking
this approach, please make
sure that the extra complexity is worth it by writing a first implementation of
the functionality using Python and Numpy and profiling your code.
If you need to improve the speed of a given part of the code your first choice
should be Numba. The motivation is that Numba
code is very similar (when not identical) to Python code, and therefore, it is
a lot easier to maintain than Cython code (see below).
Numba is also a required dependency for HyperSpy, unlike Cython which
is only an optional dependency.
It is not possible to speed up the function using Numba, and instead,
it is accompanied by a pure Python
version of the same code that behaves exactly in the same way when the
compiled C extension is not present. This extra version is required because
we may not be able to provide binaries for all platforms and not all users
will be able to compile C code in their platforms.
Please read through the official Cython recommendations
(http://docs.cython.org/) before writing Cython code.
To help troubleshoot potential deprecations in future Cython releases, add a
comment in the header of your .pyx files stating the Cython version you used
when writing the code.
Note that the “cythonized” .c or .cpp files are not welcome in the git source
repository because they are typically very large.
Once you have written your Cython files, add them to raw_extensions in
setup.py.
Added in version 1.5: External packages can extend HyperSpy by registering signals,
components and widgets.
Warning
The mechanism to register extensions is in beta state. This means that it can
change between minor and patch versions. Therefore, if you maintain a package
that registers HyperSpy extensions, please verify that it works properly with
any future HyperSpy release. We expect it to reach maturity with the release
of HyperSpy 2.0.
External packages can extend HyperSpy by registering signals, components and
widgets. Objects registered by external packages are “first-class citizens” i.e.
they can be used, saved and loaded like any of those objects shipped with
HyperSpy. Because of HyperSpy’s structure, we anticipate that most packages
registering HyperSpy extensions will provide support for specific sorts of
data.
It is good practice to add all packages that extend HyperSpy
to the list of known extensions regardless of their
maturity level. In this way, we can avoid duplication of efforts and issues
arising from naming conflicts. This repository also runs an integration test
suite daily,
which runs the test suite of all extensions to check the status of
the ecosystem. See the corresponding section
for more details.
At this point, it is worth noting that HyperSpy’s main strength is its amazing
community of users and developers. We trust that the developers of packages
that extend HyperSpy will play by the same rules that have made the Python
scientific ecosystem successful. In particular, avoiding duplication of
efforts and being good community players by contributing code to the best
matching project are essential for the sustainability of our open software
ecosystem.
When and where to create a new BaseSignal subclass#
HyperSpy provides most of its functionality through the different
BaseSignal
subclasses. A HyperSpy “signal” is a class that contains data for analysis
and functions to perform the analysis in the form of class methods. Functions
that are useful for the analysis of most datasets are in the
BaseSignal class. All other functions are in
specialized subclasses.
The flowchart below can help you decide where to add
a new data analysis function. Notice that only if no suitable package exists
for your function, you should consider creating your own.
digraph G {
A [label="New function needed!"]
B [label="Is it useful for data of any type and dimensions?",shape="diamond"]
C [label="Contribute it to BaseSignal"]
D [label="Does a SignalxD for the required dimension exist in HyperSpy?",shape="diamond"]
E [label="Contribute new SignalxD to HyperSpy"]
F [label="Is the function useful for a specific type of data only?",shape="diamond"]
G [label="Contribute it to SignalxD"]
H [label="Does a signal for that sort of data exist?",shape="diamond"]
I [label="Contribute to package providing the relevant signal"]
J [label="Create you own package and signal subclass to host the funtion"]
A->B
B->C [label="Yes"]
B->D [label="No"]
D->F [label="Yes"]
D->E [label="No"]
E->F
F->H [label="Yes"]
F->G [label="No"]
H->I [label="Yes"]
H->J [label="No"]
}
To register a new BaseSignal subclass you must add it to the
hyperspy_extension.yaml file, as in the following example:
signals:MySignal:signal_type:"MySignal"signal_type_aliases:-MS-ThisIsMySignal# The dimension of the signal subspace. For example, 2 for images, 1 for# spectra. If the signal can take any signal dimension, set it to -1.signal_dimension:1# The data type, "real" or "complex".dtype:real# True for LazySignal subclasseslazy:False# The module where the signal is located.module:my_package.signal
Note that HyperSpy uses signal_type to determine which class is the most
appropriate to deal with a particular sort of data. Therefore, the signal type
must be specific enough for HyperSpy to find a single signal subclass
match for each sort of data.
Warning
HyperSpy assumes that only one signal
subclass exists for a particular signal_type. It is up to external
package developers to avoid signal_type clashes, typically by collaborating
in developing a single package per data type.
The optional signal_type_aliases are used to determine the most appropriate
signal subclass when using
set_signal_type().
For example, if the signal_typeElectronEnergyLossSpectroscopy
has an EELS alias, setting the signal type to EELS will correctly assign
the signal subclass with ElectronEnergyLossSpectroscopy signal type.
It is good practice to choose a very explicit signal_type while leaving
acronyms for signal_type_aliases.
HyperSpy provides the hyperspy._components.expression.Expression
component that enables easy creation of 1D and 2D components from
mathematical expressions. Therefore, strictly speaking, we only need to
create new components when they cannot be expressed as simple mathematical
equations. However, HyperSpy is all about simplifying the interactive data
processing workflow. Therefore, we consider that functions that are commonly
used for model fitting, in general or specific domains, are worth adding to
HyperSpy itself (if they are of common interest) or to specialized external
packages extending HyperSpy.
The flowchart below can help you decide when and where to add
a new hyperspy model hyperspy.component.Component
for your function, should you consider creating your own.
digraph G {
A [label="New component needed!"]
B [label="Can it be declared using Expression?",shape="diamond"]
C [label="Can it be useful to other users?",shape="diamond"]
D [label="Just use Expression"]
E [label="Create new component using Expression"]
F [label="Create new component from scratch"]
G [label="Is it useful for general users?",shape="diamond"]
H [label="Contribute it to HyperSpy"]
I [label="Does a suitable package exist?",shape="diamond"]
J [label="Contribute it to the relevant package"]
K [label="Create your own package to host it"]
A->B
B->C [label="Yes"]
B->F [label="No"]
C->E [label="Yes"]
C->D [label="No"]
E->G
F->G
G->H [label="Yes"]
G->I [label="No"]
I->J [label="Yes"]
I->K [label="No"]
}
All new components must be a subclass of
hyperspy._components.expression.Expression. To register a new
1D component add it to the hyperspy_extension.yaml file as in the following
example:
components1D:# _id_name of the component. It must be a UUID4. This can be generated# using ``uuid.uuid4()``. Also, many editors can automatically generate# UUIDs. The same UUID must be stored in the components ``_id_name`` attribute.fc731a2c-0a05-4acb-91df-d15743b531c3:# The module where the component class is located.module:my_package.components# The actual class of the componentclass:MyComponent1DClass
Equivalently, to add a new component 2D:
components2D:# _id_name of the component. It must be a UUID4. This can be generated# using ``uuid.uuid4()``. Also, many editors can automatically generate# UUIDs. The same UUID must be stored in the components ``_id_name`` attribute.2ffbe0b5-a991-4fc5-a089-d2818a80a7e0:# The module where the component is located.module:my_package.componentsclass:MyComponent2DClass
Note
HyperSpy’s legacy components use their class name instead of a UUID as
_id_name. This is for compatibility with old versions of the software.
New components (including those provided through the extension mechanism)
must use a UUID4 in order to i) avoid name clashes ii) make it easy to find
the component online if e.g. the package is renamed or the component
relocated.
Creating and registering new widgets and toolkeys#
To generate GUIs of specific methods and functions, HyperSpy uses widgets and
toolkeys:
widgets (typically ipywidgets or traitsui objects) generate GUIs,
toolkeys are functions which associate widgets to a signal method
or to a module function.
Declare a new toolkey, e. g. by adding the hyperspy.ui_registry.add_gui_method
decorator to the function you want to assign a widget to.
Register a new toolkey that you have declared in your package by adding it to
the hyperspy_extension.yaml file, as in the following example:
GUI:# In order to assign a widget to a function, that function must declare# a `toolkey`. The `toolkeys` list contains a list of all the toolkeys# provided by extensions. In order to avoid name clashes, by convention,# toolkeys must start with the name of the package that provides them.toolkeys:-my_package.MyComponent
In the example below, we register a new ipywidget widget for the
my_package.MyComponent toolkey of the previous example. The function
simply returns the widget to display. The key module defines where the functions
resides.
GUI:widgets:ipywidgets:# Each widget is declared using a dictionary with two keys, `module` and `function`.my_package.MyComponent:# The function that creates the widgetfunction:get_mycomponent_widget# The module where the function resides.module:my_package.widgets
The integration test suite
runs the test suite of HyperSpy and of all registered HyperSpy extensions on a daily basis against both the
release and development versions. The build matrix is as follows:
The development packages of the dependencies are provided by the
scipy-wheels-nightly
repository, which provides scipy, numpy, scikit-learn and scikit-image
at the time of writing.
The pre-release packages are obtained from PyPI and these
will be used for any dependency which provides a pre-release package on PyPI.
A similar Integration test
workflow can run from pull requests (PR) to the
hyperspy repository when the label
run-extension-tests is added to a PR or when a PR review is edited.
NEP 29
(NumPy Enhancement Proposals) recommends that all projects across the
Scientific Python ecosystem adopt a common “time window-based” policy for
support of Python and NumPy versions. Standardizing a recommendation for
project support of minimum Python and NumPy versions will improve downstream
project planning.
All minor versions of Python released 42 months prior to the project, and
at minimum the two latest minor versions.
All minor versions of numpy released in the 24 months prior to the project,
and at minimum the last three minor versions.
In setup.py, the python_requires variable should be set to the minimum
supported version of Python. All supported minor versions of Python should be
in the test matrix and have binary artifacts built for the release.
Minimum Python and NumPy version support should be adjusted upward on every
major and minor release, but never on a patch release.
Download metrics are available from pypi and Anaconda cloud, but the reliability
of these numbers is poor for the following reason:
hyperspy is distributed by other means: the
hyperspy-bundle, or by
various linux distribution (Arch-Linux, openSUSE)
these packages may be used by continuous integration of other python libraries
However, distribution of downloaded versions can be useful to identify
issues, such as version pinning or library incompatibilities. Various services
processing the pypi data
are available online:
For use inside of jupyter notebooks, html representations are functions which allow for
more detailed data representations using snippets of populated HTML.
Hyperspy uses jinja and extends dask’shtml representations in many cases in
line with this PR: dask/dask#8019
Some of these workflow need to access GitHub “secrets”,
which are private to the HyperSpy repository. The personal access token PAT_DOCUMENTATION is set to be able to push
the documentation built to the hyperspy/hyperspy-doc repository.
To reduce the risk that these “secrets” are made accessible publicly, for example, through the
injection of malicious code by third parties in one of the GitHub workflows used in the HyperSpy
organisation, the third party actions (those that are not provided by established trusted parties)
are pinned to the SHA of a specific commit, which is trusted not to contain malicious code.
The workflows in the HyperSpy repository use GitHub actions provided by established trusted parties
and third parties. They are updated regularly by the
dependabot
in pull requests.
When updating a third party action, the action has to be pinned using the SHA of the commit of
the updated version and the corresponding code changes will need to be reviewed to verify that it
doesn’t include malicious code.
HyperSpy is an open source Python library which provides tools to facilitate
the interactive data analysis of multidimensional datasets that can be
described as multidimensional arrays of a given signal (e.g. a 2D array of
spectra a.k.a spectrum image).
HyperSpy aims at making it easy and natural to apply analytical procedures
that operate on an individual signal to multidimensional datasets of any
size, as well as providing easy access to analytical tools that exploit their
multidimensionality.
Added in version 1.5: External packages can extend HyperSpy by registering signals,
components and widgets.
The functionality of HyperSpy can be extended by external packages, e.g. to
implement features for analyzing a particular sort of data (usually related to a
specific set of experimental methods). A list of packages that extend HyperSpy is curated in a
dedicated repository. For details on how to register extensions see
Writing packages that extend HyperSpy.
Changed in version 2.0: HyperSpy was split into a core package (HyperSpy) that provides the common
infrastructure for multidimensional datasets and the dedicated IO package
RosettaSciIO. Signal classes focused on
specific types of data previously included in HyperSpy (EELS, EDS, Holography)
were moved to specialized HyperSpy extensions.
HyperSpy has been written by a subset of the people who use it, a particularity
that sets its character:
To us, this program is a research tool, much like a screwdriver or a Green’s
function. We believe that the better our tools are, the better our research
will be. We also think that it is beneficial for the advancement of knowledge
to share our research tools and to forge them in a collaborative way. This is
because by collaborating we advance faster, mainly by avoiding reinventing the
wheel. Idealistic as it may sound, many other people think like this and it is
thanks to them that this project exists.
Not surprisingly, we care about making it easy for others to contribute to
HyperSpy. In other words,
we aim at minimising the “user becomes developer” threshold.
Do you want to contribute already? No problem, see the Introduction
for details.
The main way of interacting with the program is through scripting.
This is because Jupyter exists, making your
interactive data analysis productive, scalable, reproducible and,
most importantly, fun. That said, widgets to interact with HyperSpy
elements are provided where there
is a clear productivity advantage in doing so. See the
hyperspy-gui-ipywidgets
and
hyperspy-gui-traitsui
packages for details. Not enough? If you
need a full, standalone GUI, HyperSpyUI
is for you.
New to HyperSpy or Python? The getting started guide provides an
introduction on basic usage of HyperSpy and how to install it.
User Guide
The user guide provides in-depth information on key concepts of HyperSpy
and how to use it along with background information and explanations.
Reference
Documentation of the metadata specification and of the Application Progamming Interface (API),
which describe how HyperSpy functions work and which parameters can be used.
Examples
Gallery of short examples illustrating simple tasks that can be performed with HyperSpy.
Tutorials
Tutorials in form of Jupyter Notebooks to learn how to
process multi-dimensional data using HyperSpy.
Contributing
HyperSpy is a community project maintained for and by its users.
There are many ways you can help!
If HyperSpy has been significant to a project that leads to an academic
publication, please acknowledge that fact by citing it. The DOI in the
badge below is the Concept DOI of
HyperSpy. It can be used to cite the project without referring to a specific
version. If you are citing HyperSpy because you have used it to process data,
please use the DOI of the specific version that you have employed. You can
find iy by clicking on the DOI badge below.
Given the increasing number of articles that cite HyperSpy we do not maintain a list of
articles citing HyperSpy. For an up to date list search for
HyperSpy in a scientific database e.g. Google Scholar.
Note
Articles published before 2012 may mention the HyperSpy project under
its old name, EELSLab.