tslearn
’s documentation¶
tslearn
is a Python package that provides machine learning tools for the
analysis of time series.
This package builds on (and hence depends on) scikit-learn
, numpy
and
scipy
libraries.
This documentation contains a quick-start guide (including installation procedure and basic usage of the toolkit), a complete API Reference, as well as a gallery of examples.
Finally, if you use tslearn
in a scientific publication,
we would appreciate citations.
Quick-start guide¶
For a list of functions and classes available in tslearn
, please have a
look at our API Reference.
Installation¶
Using conda¶
The easiest way to install tslearn
is probably via conda
:
conda install -c conda-forge tslearn
Using PyPI¶
Using pip
should also work fine:
python -m pip install tslearn
In this case, you should have numpy
, cython
and C++ build tools
available at build time.
Using latest github-hosted version¶
If you want to get tslearn
’s latest version, you can refer to the
repository hosted at github:
python -m pip install https://github.com/tslearn-team/tslearn/archive/main.zip
In this case, you should have numpy
, cython
and C++ build tools
available at build time.
It seems on some platforms Cython
dependency does not install properly.
If you experiment such an issue, try installing it with the following command:
python -m pip install cython
before you start installing tslearn
.
If it still does not work, we suggest you switch to conda installation.
Other requirements¶
tslearn
builds on (and hence depends on) scikit-learn
, numpy
and
scipy
libraries.
If you plan to use the tslearn.shapelets
module from
tslearn
, tensorflow
(v2) should also be installed.
h5py
is required for reading or writing models using the hdf5 file format.
In order to load multivariate datasets from the UCR/UEA archive using the
tslearn.datasets.UCR_UEA_datasets
class,
installed scipy
version should be greater than 1.3.0.
Getting started¶
This tutorial will guide you to format your first time series data, import standard datasets, and manipulate them using dedicated machine learning algorithms.
Time series format¶
First, let us have a look at what tslearn
time series format is. To do so, we will use the to_time_series
utility
from tslearn.utils
:
>>> from tslearn.utils import to_time_series
>>> my_first_time_series = [1, 3, 4, 2]
>>> formatted_time_series = to_time_series(my_first_time_series)
>>> print(formatted_time_series.shape)
(4, 1)
In tslearn
, a time series is nothing more than a two-dimensional numpy
array with its first dimension corresponding
to the time axis and the second one being the feature dimensionality (1 by default).
Then, if we want to manipulate sets of time series, we can cast them to three-dimensional arrays, using
to_time_series_dataset
. If time series from the set are not equal-sized, NaN values are appended to the shorter
ones and the shape of the resulting array is (n_ts, max_sz, d)
where max_sz
is the maximum of sizes for time
series in the set.
>>> from tslearn.utils import to_time_series_dataset
>>> my_first_time_series = [1, 3, 4, 2]
>>> my_second_time_series = [1, 2, 4, 2]
>>> formatted_dataset = to_time_series_dataset([my_first_time_series, my_second_time_series])
>>> print(formatted_dataset.shape)
(2, 4, 1)
>>> my_third_time_series = [1, 2, 4, 2, 2]
>>> formatted_dataset = to_time_series_dataset([my_first_time_series,
my_second_time_series,
my_third_time_series])
>>> print(formatted_dataset.shape)
(3, 5, 1)
Importing standard time series datasets¶
If you aim at experimenting with standard time series datasets, you should have a look at the
tslearn.datasets
.
>>> from tslearn.datasets import UCR_UEA_datasets
>>> X_train, y_train, X_test, y_test = UCR_UEA_datasets().load_dataset("TwoPatterns")
>>> print(X_train.shape)
(1000, 128, 1)
>>> print(y_train.shape)
(1000,)
Note that when working with time series datasets, it can be useful to rescale time series using tools from the
tslearn.preprocessing
.
If you want to import other time series from text files, the expected format is:
each line represents a single time series (and time series from a dataset are not forced to be the same length);
in each line, modalities are separated by a | character (useless if you only have one modality in your data);
in each modality, observations are separated by a space character.
Here is an example of such a file storing two time series of dimension 2 (the first time series is of length 3 and the second one is of length 2).
1.0 0.0 2.5|3.0 2.0 1.0
1.0 2.0|4.333 2.12
To read from / write to this format, have a look at the tslearn.utils
:
>>> from tslearn.utils import save_time_series_txt, load_time_series_txt
>>> time_series_dataset = load_time_series_txt("path/to/your/file.txt")
>>> save_time_series_txt("path/to/another/file.txt", dataset_to_be_saved)
Playing with your data¶
Once your data is loaded and formatted according to tslearn
standards, the next step is to feed machine learning
models with it. Most tslearn
models inherit from scikit-learn
base classes, hence interacting with them is very
similar to interacting with a scikit-learn
model, except that datasets are not two-dimensional arrays, but rather
tslearn
time series datasets (i.e. three-dimensional arrays or lists of two-dimensional arrays).
>>> from tslearn.clustering import TimeSeriesKMeans
>>> km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
>>> km.fit(X_train)
As seen above, one key parameter when applying machine learning methods to time series datasets is the metric to be used. You can learn more about it in the dedicated section of this documentation.
Methods for variable-length time series¶
This page lists machine learning methods in tslearn that are able to deal with datasets containing time series of different lengths. We also provide example usage for these methods using the following variable-length time series dataset:
from tslearn.utils import to_time_series_dataset
X = to_time_series_dataset([[1, 2, 3, 4], [1, 2, 3], [2, 5, 6, 7, 8, 9]])
y = [0, 0, 1]
Classification¶
Examples¶
from tslearn.neighbors import KNeighborsTimeSeriesClassifier
knn = KNeighborsTimeSeriesClassifier(n_neighbors=2)
knn.fit(X, y)
from tslearn.svm import TimeSeriesSVC
clf = TimeSeriesSVC(C=1.0, kernel="gak")
clf.fit(X, y)
from tslearn.shapelets import LearningShapelets
clf = LearningShapelets(n_shapelets_per_size={3: 1})
clf.fit(X, y)
Regression¶
Examples¶
from tslearn.svm import TimeSeriesSVR
clf = TimeSeriesSVR(C=1.0, kernel="gak")
y_reg = [1.3, 5.2, -12.2]
clf.fit(X, y_reg)
Nearest-neighbor search¶
Examples¶
from tslearn.neighbors import KNeighborsTimeSeries
knn = KNeighborsTimeSeries(n_neighbors=2)
knn.fit(X)
knn.kneighbors() # Search for neighbors using series from `X` as queries
knn.kneighbors(X2) # Search for neighbors using series from `X2` as queries
Clustering¶
Examples¶
from tslearn.clustering import KernelKMeans
gak_km = KernelKMeans(n_clusters=2, kernel="gak")
labels_gak = gak_km.fit_predict(X)
from tslearn.clustering import TimeSeriesKMeans
km = TimeSeriesKMeans(n_clusters=2, metric="dtw")
labels = km.fit_predict(X)
km_bis = TimeSeriesKMeans(n_clusters=2, metric="softdtw")
labels_bis = km_bis.fit_predict(X)
from tslearn.clustering import TimeSeriesKMeans, silhouette_score
km = TimeSeriesKMeans(n_clusters=2, metric="dtw")
labels = km.fit_predict(X)
silhouette_score(X, labels, metric="dtw")
Barycenter computation¶
Examples¶
from tslearn.barycenters import dtw_barycenter_averaging
bar = dtw_barycenter_averaging(X, barycenter_size=3)
from tslearn.barycenters import softdtw_barycenter
from tslearn.utils import ts_zeros
initial_barycenter = ts_zeros(sz=5)
bar = softdtw_barycenter(X, init=initial_barycenter)
Model selection¶
Also, model selection tools offered by scikit-learn
can be used on
variable-length data, in a standard way, such as:
from sklearn.model_selection import KFold, GridSearchCV
from tslearn.neighbors import KNeighborsTimeSeriesClassifier
knn = KNeighborsTimeSeriesClassifier(metric="dtw")
p_grid = {"n_neighbors": [1, 5]}
cv = KFold(n_splits=2, shuffle=True, random_state=0)
clf = GridSearchCV(estimator=knn, param_grid=p_grid, cv=cv)
clf.fit(X, y)
Resampling¶
Finally, if you want to use a method that cannot run on variable-length time series, one option would be to first resample your data so that all your time series have the same length and then run your method on this resampled version of your dataset.
Note however that resampling will introduce temporal distortions in your data. Use with great care!
from tslearn.preprocessing import TimeSeriesResampler
resampled_X = TimeSeriesResampler(sz=X.shape[1]).fit_transform(X)
Backend selection and use¶
tslearn proposes different backends (NumPy and PyTorch) to compute time series metrics such as DTW and Soft-DTW. The PyTorch backend can be used to compute gradients of metric functions thanks to automatic differentiation.
Backend selection¶
A backend can be instantiated using the function instantiate_backend
.
To specify which backend should be instantiated (NumPy or PyTorch),
this function accepts four different kind of input parameters:
a string equal to
"numpy"
or"pytorch"
.a NumPy array or a Torch tensor.
a Backend instance. The input backend is then returned.
None
or anything else than mentioned previously. The backend NumPy is then instantiated.
Examples¶
If the input is the string "numpy"
, the NumPyBackend
is instantiated.
>>> from tslearn.backend import instantiate_backend
>>> be = instantiate_backend("numpy")
>>> print(be.backend_string)
"numpy"
If the input is the string "pytorch"
, the PyTorchBackend
is instantiated.
>>> be = instantiate_backend("pytorch")
>>> print(be.backend_string)
"pytorch"
If the input is a NumPy array, the NumPyBackend
is instantiated.
>>> import numpy as np
>>> be = instantiate_backend(np.array([0]))
>>> print(be.backend_string)
"numpy"
If the input is a Torch tensor, the PyTorchBackend
is instantiated.
>>> import torch
>>> be = instantiate_backend(torch.tensor([0]))
>>> print(be.backend_string)
"pytorch"
If the input is a Backend instance, the input backend is returned.
>>> print(be.backend_string)
"pytorch"
>>> be = instantiate_backend(be)
>>> print(be.backend_string)
"pytorch"
If the input is None
, the NumPyBackend
is instantiated.
>>> be = instantiate_backend(None)
>>> print(be.backend_string)
"numpy"
If the input is anything else, the NumPyBackend
is instantiated.
>>> be = instantiate_backend("Hello, World!")
>>> print(be.backend_string)
"numpy"
The function instantiate_backend
accepts any number of input parameters, including zero.
To select which backend should be instantiated (NumPy or PyTorch),
a for loop is performed on the inputs until a backend is selected.
>>> be = instantiate_backend(1, None, "Hello, World!", torch.tensor([0]), "numpy")
>>> print(be.backend_string)
"pytorch"
If none of the inputs are related to NumPy or PyTorch, the NumPyBackend
is instantiated.
>>> be = instantiate_backend(1, None, "Hello, World!")
>>> print(be.backend_string)
"numpy"
Use the backends¶
The names of the attributes and methods of the backends are inspired by the NumPy backend.
Examples¶
Create backend objects.
>>> be = instantiate_backend("pytorch")
>>> mat = be.array([[0 , 1], [2, 3]], dtype=float)
>>> print(mat)
tensor([[0., 1.],
[2., 3.]], dtype=torch.float64)
Use backend functions.
>>> norm = be.linalg.norm(mat)
>>> print(norm)
tensor(3.7417, dtype=torch.float64)
Choose the backend used by metric functions¶
tslearn’s metric functions have an optional input parameter “be
” to specify the
backend to use to compute the metric.
Examples¶
>>> import torch
>>> from tslearn.metrics import dtw
>>> s1 = torch.tensor([[1.0], [2.0], [3.0]], requires_grad=True)
>>> s2 = torch.tensor([[3.0], [4.0], [-3.0]])
>>> sim = dtw(s1, s2, be="pytorch")
>>> print(sim)
sim tensor(6.4807, grad_fn=<SqrtBackward0>)
By default, the optional input parameter be
is equal to None
.
Note that the first line of the function dtw
is:
be = instantiate_backend(be, s1, s2)
Therefore, even if be=None
, the PyTorchBackend
is instantiated and used to compute the
DTW metric since s1
and s2
are Torch tensors.
>>> sim = dtw(s1, s2)
>>> print(sim)
sim tensor(6.4807, grad_fn=<SqrtBackward0>)
Automatic differentiation¶
The PyTorch backend can be used to compute the gradients of the metric functions thanks to automatic differentiation.
Examples¶
Compute the gradient of the Dynamic Time Warping similarity measure.
>>> s1 = torch.tensor([[1.0], [2.0], [3.0]], requires_grad=True)
>>> s2 = torch.tensor([[3.0], [4.0], [-3.0]])
>>> sim = dtw(s1, s2, be="pytorch")
>>> sim.backward()
>>> d_s1 = s1.grad
>>> print(d_s1)
tensor([[-0.3086],
[-0.1543],
[ 0.7715]])
Compute the gradient of the Soft-DTW similarity measure.
>>> from tslearn.metrics import soft_dtw
>>> ts1 = torch.tensor([[1.0], [2.0], [3.0]], requires_grad=True)
>>> ts2 = torch.tensor([[3.0], [4.0], [-3.0]])
>>> sim = soft_dtw(ts1, ts2, gamma=1.0, be="pytorch", compute_with_backend=True)
>>> print(sim)
tensor(41.1876, dtype=torch.float64, grad_fn=<SelectBackward0>)
>>> sim.backward()
>>> d_ts1 = ts1.grad
>>> print(d_ts1)
tensor([[-4.0001],
[-2.2852],
[10.1643]])
Integration with other Python packages¶
tslearn
is a general-purpose Python machine learning library for time
series that offers tools for pre-processing and feature extraction as well as
dedicated models for clustering, classification and regression.
To ensure compatibility with more specific Python packages, we provide utilities
to convert data sets from and to other formats.
tslearn.utils.to_time_series_dataset()
is a general function that
transforms an array-like object into a three-dimensional array of shape
(n_ts, sz, d)
with the following conventions:
the fist axis is the sample axis,
n_ts
being the number of time series;the second axis is the time axis,
sz
being the maximum number of time points;the third axis is the dimension axis,
d
being the number of dimensions.
This is how a data set of time series is represented in tslearn
.
The following sections briefly explain how to transform a data set from
tslearn
to another supported Python package and vice versa.
scikit-learn¶
scikit-learn is a popular Python package for
machine learning.
tslearn.utils.to_sklearn_dataset()
converts a data set from tslearn
format to scikit-learn
format. To convert a data set from
scikit-learn
, you can use tslearn.utils.to_time_series_dataset()
.
>>> from tslearn.utils import to_sklearn_dataset
>>> to_sklearn_dataset([[1, 2], [1, 4, 3]])
array([[ 1., 2., nan],
[ 1., 4., 3.]])
>>> to_time_series_dataset([[ 1., 2., None], [ 1., 4., 3.]])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
pyts¶
pyts is a Python package dedicated to time
series classification.
tslearn.utils.to_pyts_dataset()
and tslearn.utils.from_pyts_dataset()
allow users to convert a data set from tslearn
format to pyts
format
and vice versa.
>>> from tslearn.utils import from_pyts_dataset, to_pyts_dataset
>>> from_pyts_dataset([[1, 2], [1, 4]])
array([[[1],
[2]],
[[1],
[4]]])
>>> to_pyts_dataset([[[1], [2]], [[1], [4]]])
array([[1., 2.],
[1., 4.]])
seglearn¶
seglearn is a python package for machine
learning time series or sequences.
tslearn.utils.to_seglearn_dataset()
and tslearn.utils.from_seglearn_dataset()
allow users to convert a data set from tslearn
format to seglearn
format
and vice versa.
>>> from tslearn.utils import from_seglearn_dataset, to_seglearn_dataset
>>> from_seglearn_dataset([[1, 2], [1, 4, 3]])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_seglearn_dataset([[[1], [2], [None]], [[1], [4], [3]]])
array([array([[1.],
[2.]]),
array([[1.],
[4.],
[3.]])], dtype=object)
stumpy¶
stumpy is a powerful and scalable Python
library for computing a Matrix Profile, which can be used for a variety of time
series data mining tasks.
tslearn.utils.to_stumpy_dataset()
and tslearn.utils.from_stumpy_dataset()
allow users to convert a data set from tslearn
format to stumpy
format
and vice versa.
>>> import numpy as np
>>> from tslearn.utils import from_stumpy_dataset, to_stumpy_dataset
>>> from_stumpy_dataset([np.array([1, 2]), np.array([1, 4, 3])])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_stumpy_dataset([[[1], [2], [None]], [[1], [4], [3]]])
[array([1., 2.]), array([1., 4., 3.])]
sktime¶
sktime is a scikit-learn
compatible Python toolbox for learning with time series.
tslearn.utils.to_sktime_dataset()
and tslearn.utils.from_sktime_dataset()
allow users to convert a data set from tslearn
format to sktime
format
and vice versa.
pandas
is a required dependency to use these functions.
>>> import pandas as pd
>>> from tslearn.utils import from_sktime_dataset, to_sktime_dataset
>>> df = pd.DataFrame()
>>> df["dim_0"] = [pd.Series([1, 2]), pd.Series([1, 4, 3])]
>>> from_sktime_dataset(df)
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_sktime_dataset([[[1], [2], [None]], [[1], [4], [3]]]).shape
(2, 1)
pyflux¶
pyflux is a library for time series analysis
and prediction.
tslearn.utils.to_pyflux_dataset()
and tslearn.utils.from_pyflux_dataset()
allow users to convert a data set from tslearn
format to pyflux
format
and vice versa.
pandas
is a required dependency to use these functions.
>>> import pandas as pd
>>> from tslearn.utils import from_pyflux_dataset, to_pyflux_dataset
>>> df = pd.DataFrame([1, 2], columns=["dim_0"])
>>> from_pyflux_dataset(df)
array([[[1.],
[2.]]])
>>> to_pyflux_dataset([[[1], [2]]]).shape
(2, 1)
tsfresh¶
tsfresh is a python package automatically
calculating a large number of time series characteristics.
tslearn.utils.to_tsfresh_dataset()
and tslearn.utils.from_tsfresh_dataset()
allow users to convert a data set from tslearn
format to tsfresh
format
and vice versa.
pandas
is a required dependency to use these functions.
>>> import pandas as pd
>>> from tslearn.utils import from_tsfresh_dataset, to_tsfresh_dataset
>>> df = pd.DataFrame([[0, 0, 1.0],
... [0, 1, 2.0],
... [1, 0, 1.0],
... [1, 1, 4.0],
... [1, 2, 3.0]], columns=['id', 'time', 'dim_0'])
>>> from_tsfresh_dataset(df)
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_tsfresh_dataset([[[1], [2], [None]], [[1], [4], [3]]]).shape
(5, 3)
cesium¶
cesium is an open-source platform for time series inference.
tslearn.utils.to_cesium_dataset()
and tslearn.utils.from_cesium_dataset()
allow users to convert a data set from tslearn
format to cesium
format
and vice versa.
cesium
is a required dependency to use these functions.
>>> from tslearn.utils import from_cesium_dataset, to_cesium_dataset
>>> from cesium.data_management import TimeSeries
>>> from_cesium_dataset([TimeSeries(m=[1, 2]), TimeSeries(m=[1, 4, 3])])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> len(to_cesium_dataset([[[1], [2], [None]], [[1], [4], [3]]]))
2
Contributing¶
First of all, thank you for considering contributing to tslearn
.
It’s people like you that will help make tslearn
a great toolkit.
Contributions are managed through GitHub Issues and Pull Requests.
We are welcoming contributions in the following forms:
Bug reports: when filing an issue to report a bug, please use the search tool to ensure the bug hasn’t been reported yet;
New feature suggestions: if you think
tslearn
should include a new algorithm, please open an issue to ask for it (of course, you should always check that the feature has not been asked for yet :). Think about linking to a pdf version of the paper that first proposed the method when suggesting a new algorithm.Bug fixes and new feature implementations: if you feel you can fix a reported bug/implement a suggested feature yourself, do not hesitate to:
fork the project;
implement your bug fix;
submit a pull request referencing the ID of the issue in which the bug was reported / the feature was suggested;
If you would like to contribute by implementing a new feature reported in the Issues, maybe starting with Issues that are attached the “good first issue” label would be a good idea.
When submitting code, please think about code quality, adding proper docstrings including doctests with high code coverage.
More details on Pull requests¶
The preferred workflow for contributing to tslearn is to fork the main repository on GitHub, clone, and develop on a branch. Steps:
Fork the project repository by clicking on the ‘Fork’ button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see this guide.
Clone your fork of the tslearn repo from your GitHub account to your local disk:
$ git clone git@github.com:YourLogin/tslearn.git $ cd tslearn
Create a
my-feature
branch to hold your development changes. Always use amy-feature
branch. It’s good practice to never work on themaster
branch:$ git checkout -b my-feature
Develop the feature on your feature branch. To record your changes in git, add changed files using
git add
and thengit commit
files:$ git add modified_files $ git commit
Push the changes to your GitHub account with:
$ git push -u origin my-feature
Follow these instructions to create a pull request from your fork. This will send an email to the committers.
(If any of the above seems like magic to you, please look up the Git documentation on the web, or ask a friend or another contributor for help.)
Pull Request Checklist¶
We recommended that your contribution complies with the following rules before you submit a pull request:
Follow the PEP8 Guidelines.
If your pull request addresses an issue, please use the pull request title to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is created.
All public methods should have informative docstrings with sample usage presented as doctests when appropriate.
Please prefix the title of your pull request with
[MRG]
(Ready for Merge), if the contribution is complete and ready for a detailed review. An incomplete contribution – where you expect to do more work before receiving a full review – should be prefixed[WIP]
(to indicate a work in progress) and changed to[MRG]
when it matures. WIPs may be useful to: indicate you are working on something to avoid duplicated work, request broad review of functionality or API, or seek collaborators. WIPs often benefit from the inclusion of a task list in the PR description.When adding additional functionality, provide at least one example script in the
tslearn/docs/examples/
folder. Have a look at other examples for reference. Examples should demonstrate why the new functionality is useful in practice and, if possible, compare it to other methods available in tslearn.Documentation and high-coverage tests are necessary for enhancements to be accepted. Bug-fixes or new features should be provided with non-regression tests. These tests verify the correct behavior of the fix or feature. In this manner, further modifications on the code base are granted to be consistent with the desired behavior. For the Bug-fixes case, at the time of the PR, this tests should fail for the code base in master and pass for the PR code.
At least one paragraph of narrative documentation with links to references in the literature (with PDF links when possible) and the example.
Here is a description of useful tools to check your code locally:
No PEP8 or PEP257 errors; check with the flake8 Python package:
$ pip install flake8 $ flake8 path/to/module.py # check for errors in one file $ flake8 path/to/folder # check for errors in all the files in a folder $ git diff -u | flake8 --diff # check for errors in the modified code only
To run the tests locally and get code coverage, use the pytest and pytest-cov Python packages:
$ pip install pytest pytest-cov $ pytest --cov tslearn
To build the documentation locally, install the following packages and run the
make html
command in thetslearn/docs
folder:$ pip install sphinx==1.8.5 sphinx-gallery sphinx-bootstrap-theme nbsphinx $ pip install numpydoc matplotlib $ cd tslearn/docs $ make html
The documentation will be generated in the
_build/html
. You can double click onindex.html
to open the index page, which will look like the first page that you see on the online documentation. Then you can move to the pages that you modified and have a look at your changes.
Bonus points for contributions that include a performance analysis with a benchmark script and profiling output.
User Guide¶
Dynamic Time Warping¶
Dynamic Time Warping (DTW) [1] is a similarity measure between time series.
Let us consider two time series \(x = (x_0, \dots, x_{n-1})\) and
\(y = (y_0, \dots, y_{m-1})\) of respective lengths \(n\) and
\(m\).
Here, all elements \(x_i\) and \(y_j\) are assumed to lie in the same
\(d\)-dimensional space.
In tslearn
, such time series would be represented as arrays of respective
shapes (n, d) and (m, d) and DTW can be computed using the following code:
from tslearn.metrics import dtw, dtw_path
dtw_score = dtw(x, y)
# Or, if the path is also an important information:
optimal_path, dtw_score = dtw_path(x, y)
Optimization problem¶
DTW between \(x\) and \(y\) is formulated as the following optimization problem:
where \(\pi = [\pi_0, \dots , \pi_K]\) is a path that satisfies the following properties:
it is a list of index pairs \(\pi_k = (i_k, j_k)\) with \(0 \leq i_k < n\) and \(0 \leq j_k < m\)
\(\pi_0 = (0, 0)\) and \(\pi_K = (n - 1, m - 1)\)
for all \(k > 0\) , \(\pi_k = (i_k, j_k)\) is related to \(\pi_{k-1} = (i_{k-1}, j_{k-1})\) as follows:
\(i_{k-1} \leq i_k \leq i_{k-1} + 1\)
\(j_{k-1} \leq j_k \leq j_{k-1} + 1\)
Here, a path can be seen as a temporal alignment of time series such that Euclidean distance between aligned (ie. resampled) time series is minimal.
The following image exhibits the DTW path (in white) for a given pair of time series, on top of the cross-similarity matrix that stores \(d(x_i, y_j)\) values.
Code to produce such visualization is available in our Gallery of examples.
Algorithmic solution¶
There exists an \(O(mn)\) algorithm to compute the exact optimum for this problem (pseudo-code is provided for time series indexed from 1 for simplicity):
def dtw(x, y):
# Initialization
for i = 1..n
for j = 1..m
C[i, j] = inf
C[0, 0] = 0.
# Main loop
for i = 1..n
for j = 1..m
dist = d(x_i, y_j) ** 2
C[i, j] = dist + min(C[i-1, j], C[i, j-1], C[i-1, j-1])
return sqrt(C[n, m])
Using a different ground metric¶
By default, tslearn
uses squared Euclidean distance as the base metric
(i.e. \(d(\cdot, \cdot)\) in the optimization problem above is the
Euclidean distance). If one wants to use another ground metric, the code
would then be:
from tslearn.metrics import dtw_path_from_metric
path, cost = dtw_path_from_metric(x, y, metric=compatible_metric)
in which case the optimization problem that would be solved would be:
where \(\tilde{d}(\cdot, \cdot)\) is the user-defined ground metric,
denoted compatible_metric
in the code snippet above.
Properties¶
Dynamic Time Warping holds the following properties:
\(\forall x, y, DTW(x, y) \geq 0\)
\(\forall x, DTW(x, x) = 0\)
However, mathematically speaking, DTW is not a valid distance since it does not satisfy the triangular inequality.
Additional constraints¶
The set of temporal deformations to which DTW is invariant can be reduced by setting additional constraints on the set of acceptable paths. These constraints typically consists in forcing paths to lie close to the diagonal.
First, the Sakoe-Chiba band is parametrized by a radius \(r\) (number of off-diagonal elements to consider, also called warping window size sometimes), as illustrated below:

\(n = m = 10, r = 3\). Diagonal is marked in grey for better readability.¶
The corresponding code would be:
from tslearn.metrics import dtw
cost = dtw(x, y, global_constraint="sakoe_chiba", sakoe_chiba_radius=3)
Second, the Itakura parallelogram sets a maximum slope \(s\) for alignment paths, which leads to a parallelogram-shaped constraint:

\(n = m = 10, s = 2\). Diagonal is marked in grey for better readability.¶
The corresponding code would be:
from tslearn.metrics import dtw
cost = dtw(x, y, global_constraint="itakura", itakura_max_slope=2.)
Alternatively, one can put an upper bound on the warping path length so as to discard complex paths, as described in [2]:
from tslearn.metrics import dtw_limited_warping_length
cost = dtw_limited_warping_length(x, y, max_length)
Barycenters¶
Computing barycenter (also known as Fréchet means) of a set \(\mathcal{D}\) for DTW corresponds to the following optimization problem:
Optimizing this quantity can be done through the DTW Barycenter Averaging (DBA) algorithm presented in [3].
from tslearn.barycenters import dtw_barycenter_averaging
b = dtw_barycenter_averaging(dataset)
This is the algorithm at stake when invoking
tslearn.clustering.TimeSeriesKMeans
with
metric="dtw"
.
soft-DTW¶
DTW is not differentiable with respect to its inputs because of the
non-differentiability of the min
operation.
A differentiable extension has been presented in [4] in which the min
operator is replaced by soft-min
, using the log-sum-exp formulation:
soft-DTW hence depends on a hyper-parameter \(\gamma\) that controls the smoothing of the resulting metric (squared DTW corresponds to the limit case \(\gamma \rightarrow 0\)).
from tslearn.metrics import soft_dtw
soft_dtw_score = soft_dtw(x, y, gamma=.1)
When a strictly positive value is set for \(\gamma\), the corresponding alignment matrix corresponds to a blurred version of the DTW one:

Also, barycenters for soft-DTW can be estimated through gradient descent:
from tslearn.barycenters import softdtw_barycenter
b = softdtw_barycenter(dataset, gamma=.1)
This is the algorithm at stake when invoking
tslearn.clustering.TimeSeriesKMeans
with
metric="softdtw"
.
Examples Involving DTW variants¶
References¶
Longest Common Subsequence¶
Longest Common Subsequence (LCSS) [1] is a similarity measure between time series.
Let us consider two time series \(x = (x_0, \dots, x_{n-1})\) and
\(y = (y_0, \dots, y_{m-1})\) of respective lengths \(n\) and
\(m\).
Here, all elements \(x_i\) and \(y_j\) are assumed to lie in the same
\(d\)-dimensional space.
In tslearn
, such time series would be represented as arrays of respective
shapes (n, d) and (m, d) and LCSS can be computed using the following code:
from tslearn.metrics import lcss, lcss_path
lcss_score = lcss(x, y)
# Or, if the path is also an important information:
path, lcss_score = lcss_path(x, y)
Problem¶
The similarity \(S\) between \(x\) and \(y\), given a positive real number \(\epsilon\), is formulated as follows:
The constant \(\epsilon\) is the matching threshold.
Here, a path can be seen as the parts of the time series where the Euclidean distance between them does not exceed a given threshold, i.e., they are close/similar.
To retrieve a meaningful similarity value from the length of the longest common subsequence, the percentage of that value regarding the length of the shortest time series is returned.
Algorithmic solution¶
There exists an \(O(n^2)\) algorithm to compute the solution for this problem (pseudo-code is provided for time series indexed from 1 for simplicity):
def lcss(x, y):
# Initialization
for i = 0..n
C[i, 0] = 0
for j = 0..m
C[0, j] = 0
# Main loop
for i = 1..n
for j = 1..m
if dist(x_i, x_j) <= epsilon:
C[i, j] = C[i-1, j-1] + 1
else:
C[i, j] = max(C[i, j-1], C[i-1, j])
return C[n, m] / min(n, m)
Using a different ground metric¶
By default, tslearn
uses squared Euclidean distance as the base metric
(i.e. \(dist()\) in the problem above is the
Euclidean distance). If one wants to use another ground metric, the code
would then be:
from tslearn.metrics import lcss_path_from_metric
path, cost = lcss_path_from_metric(x, y, metric=compatible_metric)
Properties¶
The Longest Common Subsequence holds the following properties:
\(\forall x, y, LCSS(x, y) \in \left[0, 1\right]\)
\(\forall x, y, LCSS(x, y) = LCSS(y, x)\)
\(\forall x, LCSS(x, x) = 1\)
The values returned by LCSS range from 0 to 1, the value 1 being taken when the two time series completely match.
Additional constraints¶
One can set additional constraints to the set of acceptable paths. These constraints typically consists in forcing paths to lie close to the diagonal.
First, the Sakoe-Chiba band is parametrized by a radius \(r\) (number of off-diagonal elements to consider, also called warping window size sometimes), as illustrated below:

\(n = m = 10, r = 3\). Diagonal is marked in grey for better readability.¶
The corresponding code would be:
from tslearn.metrics import lcss
cost = lcss(x, y, global_constraint="sakoe_chiba", sakoe_chiba_radius=3)
The Sakoe-Chiba radius corresponds to the parameter \(\delta\) mentioned in [1], it controls how far in time we can go in order to match a given point from one time series to a point in another time series.
Second, the Itakura parallelogram sets a maximum slope \(s\) for alignment paths, which leads to a parallelogram-shaped constraint:

\(n = m = 10, s = 2\). Diagonal is marked in grey for better readability.¶
The corresponding code would be:
from tslearn.metrics import lcss
cost = lcss(x, y, global_constraint="itakura", itakura_max_slope=2.)
Examples Involving LCSS variants¶

Longest Commom Subsequence with a custom distance metric
References¶
M. Vlachos, D. Gunopoulos, and G. Kollios. 2002. “Discovering Similar Multidimensional Trajectories”, In Proceedings of the 18th International Conference on Data Engineering (ICDE ‘02). IEEE Computer Society, USA, 673.
Kernel Methods¶
In the following, we will discuss the use of kernels to compare time series. A kernel \(k(\cdot, \cdot)\) is such that there exists an unknown map \(\Phi\) such that:
i.e. \(k(\cdot, \cdot)\) is the inner product between \(\mathbf{x}\) and \(\mathbf{y}\) in some (unknown) embedding space \(\mathcal{H}\). In practice, \(k(\mathbf{x}, \mathbf{y})\) will be large when \(\mathbf{x}\) and \(\mathbf{y}\) are similar and close to 0 when they are very dissimilar.
A large number of kernel methods from the machine learning literature rely on the so-called kernel trick, that consists in performing computations in the embedding space \(\mathcal{H}\) without ever actually performing any embedding. As an example, one can compute distance between \(\mathbf{x}\) and \(\mathbf{y}\) in \(\mathcal{H}\) via:
Such computations are used, for example, in the kernel-\(k\)-means algorithm (see below).
Global Alignment Kernel¶
The Global Alignment Kernel (GAK) is a kernel that operates on time series.
It is defined, for a given bandwidth \(\sigma\), as:
where \(\mathcal{A}(\mathbf{x}, \mathbf{y})\) is the set of all possible alignments between series \(\mathbf{x}\) and \(\mathbf{y}\).
It is advised in [1] to set the bandwidth \(\sigma\) as a multiple of a
simple estimate of the median distance of different points observed in
different time-series of your training set, scaled by the square root of the
median length of time-series in the set.
This estimate is made available in tslearn
through
tslearn.metrics.sigma_gak:
from tslearn.metrics import gak, sigma_gak
sigma = sigma_gak(X)
k_01 = gak(X[0], X[1], sigma=sigma)
Note however that, on long time series, this estimate can lead to numerical overflows, which smaller values can avoid.
Finally, GAK is related to softDTW [3] through the following formula:
where \(\gamma\) is the hyper-parameter controlling softDTw smoothness, which is related to the bandwidth parameter of GAK through \(\gamma = 2 \sigma^2\).
Clustering and Classification¶
Kernel \(k\)-means [2] is a method that uses the kernel trick to implicitly perform \(k\)-means clustering in the embedding space associated to a kernel. This method is discussed in our User Guide section dedicated to clustering.
Kernels can also be used in classification settings.
tslearn.svm
offers implementations of Support Vector Machines (SVM)
that accept GAK as a kernel.
This implementation heavily relies on scikit-learn
and libsvm
.
One implication is that predict_proba
and predict_log_proba
methods
are computed based on cross-validation probability estimates, which has two
main implications, as discussed in more details in scikit-learn
’s
user guide:
1. setting the constructor option probability
to True
makes the fit
step longer since it then relies on an expensive five-fold cross-validation;
2. the probability estimates obtained through predict_proba
may be
inconsistent with the scores provided by decision_function
and the
predicted class output by predict
.
Examples Using Kernel Methods¶
References¶
Cuturi. “Fast Global Alignment Kernels,” ICML 2011.
I. S. Dhillon, Y. Guan & B. Kulis. “Kernel k-means, Spectral Clustering and Normalized Cuts,” KDD 2004.
M. Cuturi, M. Blondel “Soft-DTW: a Differentiable Loss Function for Time-Series,” ICML 2017.
Time Series Clustering¶
Clustering is the task of grouping together similar objects. This task hence heavily relies on the notion of similarity one relies on.
The following Figure illustrates why choosing an adequate similarity function is key (code to reproduce is available in the Gallery of Examples).
\(k\)-means clustering with Euclidean distance. Each subfigure represents series from a given cluster and their centroid (in red).¶
This Figure is the result of a \(k\)-means clustering that uses Euclidean distance as a base metric. One issue with this metric is that it is not invariant to time shifts, while the dataset at stake clearly holds such invariants.
\(k\)-means and Dynamic Time Warping¶
To overcome the previously illustrated issue, distance metrics dedicated to time series, such as Dynamic Time Warping (DTW), are required. As can be seen in the Figure below, the use of such metrics produce more meaningful results.
The tslearn.clustering
module in tslearn
offers an
option to use DTW as the core metric in a \(k\)-means algorithm,
which leads to better clusters and centroids:
\(k\)-means clustering with Dynamic Time Warping. Each subfigure represents series from a given cluster and their centroid (in red).¶
First, clusters gather time series of similar shapes, which is due to the ability of Dynamic Time Warping (DTW) to deal with time shifts, as explained above. Second, cluster centers (aka centroids) are computed as the barycenters with respect to DTW, hence they allow to retrieve a sensible average shape whatever the temporal shifts in the cluster (see our dedicated User Guide section for more details on how these barycenters are computed).
In tslearn
, clustering a time series dataset with \(k\)-means and a
dedicated time series metric is as easy as
from tslearn.clustering import TimeSeriesKMeans
model = TimeSeriesKMeans(n_clusters=3, metric="dtw",
max_iter=10, random_state=seed)
model.fit(X_train)
where X_train
is the considered unlabelled dataset of time series.
The metric
parameter can also be set to "softdtw"
as an alternative
time series metric (cf.
our User Guide section on soft-DTW).
Kernel \(k\)-means and Time Series Kernels¶
Another option to deal with such time shifts is to rely on the kernel trick. Indeed, [1] introduces a positive semidefinite kernel for time series, inspired from DTW. Then, the kernel \(k\)-means algorithm [2], that is equivalent to a \(k\)-means that would operate in the Reproducing Kernel Hilbert Space associated to the chosen kernel, can be used:
Kernel \(k\)-means clustering with Global Alignment Kernel. Each subfigure represents series from a given cluster.¶
A first significant difference (when compared to \(k\)-means) is that cluster centers are never computed explicitly, hence time series assignments to cluster are the only kind of information available once the clustering is performed.
Second, one should note that the clusters generated by kernel-\(k\)-means are phase dependent (see clusters 2 and 3 that differ in phase rather than in shape). This is because Global Alignment Kernel is not invariant to time shifts, as demonstrated in [3] for the closely related soft-DTW [4].
Examples Using Clustering Estimators¶
References¶
Cuturi. “Fast Global Alignment Kernels,” ICML 2011.
I. S. Dhillon, Y. Guan & B. Kulis. “Kernel k-means, Spectral Clustering and Normalized Cuts,” KDD 2004.
H. Janati, M. Cuturi, A. Gramfort. “Spatio-Temporal Alignments: Optimal transport through space and time,” AISTATS 2020
M. Cuturi, M. Blondel “Soft-DTW: a Differentiable Loss Function for Time-Series,” ICML 2017.
Shapelets¶
Shapelets are defined in [1] as “subsequences that are in some sense maximally representative of a class”. Informally, if we assume a binary classification setting, a shapelet is discriminant if it is present in most series of one class and absent from series of the other class. To assess the level of presence, one uses shapelet matches:
where \(L\) is the length (number of timestamps) of shapelet \(\mathbf{s}\) and \(\mathbf{x}_{t\rightarrow t+L}\) is the subsequence extracted from time series \(\mathbf{x}\) that starts at time index \(t\) and stops at \(t+L\). If the above-defined distance is small enough, then shapelet \(\textbf{s}\) is supposed to be present in time series \(\mathbf{x}\).
The distance from a time series to a shapelet is done by sliding the shorter shapelet over the longer time series and calculating the point-wise distances. The minimal distance found is returned.¶
In a classification setting, the goal is then to find the most discriminant shapelets given some labeled time series data. Shapelets can be mined from the training set [1] or learned using gradient-descent.
Learning Time-series Shapelets¶
tslearn
provides an implementation of “Learning Time-series Shapelets”,
introduced in [2], that is an instance of the latter category.
In Learning Shapelets,
shapelets are learned such
that time series represented in their shapelet-transform space (i.e. their
distances to each of the shapelets) are linearly separable.
A shapelet-transform representation of a time series \(\mathbf{x}\) given
a set of shapelets \(\{\mathbf{s}_i\}_{i \leq k}\) is the feature vector:
\([d(\mathbf{x}, \mathbf{s}_1), \cdots, d(\mathbf{x}, \mathbf{s}_k)]\).
This is illustrated below with a two-dimensional example.
An example of how time series are transformed into linearly separable distances.¶
In tslearn
, in order to learn shapelets and transform timeseries to
their corresponding shapelet-transform space, the following code can be used:
from tslearn.shapelets import LearningShapelets
model = LearningShapelets(n_shapelets_per_size={3: 2})
model.fit(X_train, y_train)
train_distances = model.transform(X_train)
test_distances = model.transform(X_test)
shapelets = model.shapelets_as_time_series_
A tslearn.shapelets.LearningShapelets
model has several
hyper-parameters, such as the maximum number of iterations and the batch size.
One important hyper-parameters is the n_shapelets_per_size
which is a dictionary where the keys correspond to the desired lengths of the
shapelets and the values to the desired number of shapelets per length. When
set to None
, this dictionary will be determined by a
heuristic.
After creating the model, we can fit
the optimal shapelets
using our training data. After a fitting phase, the distances can be calculated
using the transform
function. Moreover, you can easily access the
learned shapelets by using the shapelets_as_time_series_
attribute.
It is important to note that due to the fact that a technique based on
gradient-descent is used to learn the shapelets, our model can be prone
to numerical issues (e.g. exploding and vanishing gradients). For that
reason, it is important to normalize your data. This can be done before
passing the data to the
fit
and
transform
methods, by using our
tslearn.preprocessing
module but this can be done internally by the algorithm itself by setting the
scale
parameter.
Examples Involving Shapelet-based Estimators¶
Learning Shapelets: decision boundaries in 2D distance space
References¶
L. Ye & E. Keogh. Time series shapelets: a new primitive for data mining. SIGKDD 2009.
Grabocka et al. Learning Time-Series Shapelets. SIGKDD 2014.
Matrix Profile¶
The Matrix Profile, \(MP\), is a new time series that can be calculated based on an input time series \(T\) and a subsequence length \(m\). \(MP_i\) corresponds to the minimal distance from the query subsequence \(T_{i\rightarrow i+m}\) to any subsequence in \(T\) [1]. As the distance from the query subsequence to itself will be equal to zero, \(T_{i-\frac{m}{4}\rightarrow i+\frac{m}{4}}\) is considered as an exclusion zone. In order to construct the Matrix Profile, a distance profile which is similar to the distance calculation used to transform time series into their shapelet-transform space, is calculated for each subsequence, as illustrated below:
For each segment, the distances to all subsequences of the time series are calculated and the minimal distance that does not correspond to the original location of the segment (where the distance is zero) is returned.¶
Implementation¶
The Matrix Profile implementation provided in tslearn
uses numpy or wraps around STUMPY [2]. Three different versions are available:
numpy
: a slow implementationstump
: a fast CPU version, which requires STUMPY to be installedgpu_stump
: the fastest version, which requires STUMPY to be installed and a GPU
Possible Applications¶
The Matrix Profile allows for many possible applications, which are well documented on the page created by the original authors [3]. Some of these applications include: motif and shapelet extraction, discord detection, earthquake detection, and many more.
Examples Involving Matrix Profile¶
References¶
C. M. Yeh, Y. Zhu, L. Ulanova, N.Begum et al. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. ICDM 2016.
Early Classification of Time Series¶
Early classification of time series is the task of performing a classification as early as possible for an incoming time series, and decision about when to trigger the decision is part of the prediction process.
Early Classification Cost Function¶
Dachraoui et al. [1] introduces a composite loss function for early classification of time series that balances earliness and accuracy.
The cost function is of the following form:
where \(\mathcal{L}_c(\cdot,\cdot,\cdot)\) is a classification loss and \(t\) is the time at which a decision is triggered by the system (\(\mathbf{x}_{\rightarrow t}\) is time series \(\mathbf{x}\) observed up to time \(t\)). In this setting, \(\alpha\) drives the tradeoff between accuracy and earliness and is supposed to be a hyper-parameter of the method.
The authors rely on (i) a clustering of the training time series and (ii) individual classifiers \(m_t(\cdot)\) trained at all possible timestamps, so as to be able to predict, at time \(t\), an expected cost for all future times \(t + \tau\) with \(\tau \geq 0\):
where:
\(P(C_k | \mathbf{x}_{\rightarrow t})\) is a soft-assignment weight of \(\mathbf{x}_{\rightarrow t}\) to cluster \(C_k\);
\(P(y=i | C_k)\) is obtained from a contingency table that stores the number of training time series of each class in each cluster;
\(P_{t+\tau}(\hat{y} = j | y=i, C_k)\) is obtained through training time confusion matrices built on time series from cluster \(C_k\) using classifier \(m_{t+\tau}(\cdot)\).
At test time, if a series is observed up to time \(t\) and if, for all positive \(\tau\) we have \(f_\tau(\mathbf{x}_{\rightarrow t}, y) \geq f_0(\mathbf{x}_{\rightarrow t}, y)\), then a decision is made using classifier \(m_t(\cdot)\).
Early classification. At test time, prediction is made at a timestamp such that the expected earliness-accuracy is optimized, which can hence vary between time series.¶
To use this early classifier in tslearn
, one can rely on the
tslearn.early_classification.NonMyopicEarlyClassifier
class:
from tslearn.early_classification import NonMyopicEarlyClassifier
early_clf = NonMyopicEarlyClassifier(n_clusters=3,
cost_time_parameter=1e-3,
lamb=1e2,
random_state=0)
early_clf.fit(X_train, y_train)
preds, times = early_clf.predict_class_and_earliness(X_test)
where cost_time_parameter
is the \(\alpha\) parameter presented above
and lamb
is a trade-off parameter for the soft-assignment of partial series
to clusters \(P(C_k | \mathbf{x}_{\rightarrow t})\) (when lamb
tends to
infinity, the assignment tends to hard-assignment, and when lamb
is set to
0, equal probabilities are obtained for all clusters).
Examples Involving Early Classification Estimators¶
References¶
A. Dachraoui, A. Bondu and A. Cornuejols. “Early classification of time series as a non myopic sequential decision making problem,” ECML/PKDD 2015
API Reference¶
The complete tslearn
project is automatically documented for every module.
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
|
The |
Gallery of examples¶
Metrics¶

Longest Commom Subsequence with a custom distance metric
Nearest Neighbors¶
Hyper-parameter tuning of a Pipeline with KNeighborsTimeSeriesClassifier
Clustering and Barycenters¶
Classification¶
Learning Shapelets: decision boundaries in 2D distance space
Automatic differentiation¶
Miscellaneous¶
Metrics¶

Longest Commom Subsequence with a custom distance metric
Note
Go to the end to download the full example code
Longest Common Subsequence¶
This example illustrates LCSS computation between time series and plots the alignment path [1]. and its relationship to the DTW.
Since LCSS focuses on the similar parts between two time-series, a potential use case is to identify the similarity between time-series whose lengths differ greatly or have noise. As one example, M. Vlachos et al. [1] used this method to cluster time series regarding human writing in the presence of noise.
The example demonstrates the use of the functions lcss_path and dtw_path to calculate the alignment path between them and compare the two approaches when dealing with unequal-length sequence data and noise.
[1] M. Vlachos, D. Gunopoulos, and G. Kollios. 2002. “Discovering Similar Multidimensional Trajectories”, In Proceedings of the 18th International Conference on Data Engineering (ICDE ‘02). IEEE Computer Society, USA, 673.
# Author: Daniela Duarte
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.generators import random_walks
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn import metrics
numpy.random.seed(0)
n_ts, sz, d = 2, 100, 1
dataset = random_walks(n_ts=n_ts, sz=sz, d=d, random_state=5)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.) # Rescale time series
dataset_scaled = scaler.fit_transform(dataset)
lcss_path, sim_lcss = metrics.lcss_path(dataset_scaled[0, :, 0], dataset_scaled[1, :40, 0], eps=1.5)
dtw_path, sim_dtw = metrics.dtw_path(dataset_scaled[0, :, 0], dataset_scaled[1, :40, 0])
plt.figure(1, figsize=(8, 8))
plt.plot(dataset_scaled[0, :, 0], "b-", label='First time series')
plt.plot(dataset_scaled[1, :40, 0], "g-", label='Second time series')
for positions in lcss_path:
plt.plot([positions[0], positions[1]],
[dataset_scaled[0, positions[0], 0], dataset_scaled[1, positions[1], 0]], color='orange')
plt.legend()
plt.title("Time series matching with LCSS")
plt.figure(2, figsize=(8, 8))
plt.plot(dataset_scaled[0, :, 0], "b-", label='First time series')
plt.plot(dataset_scaled[1, :40, 0], "g-", label='Second time series')
for positions in dtw_path:
plt.plot([positions[0], positions[1]],
[dataset_scaled[0, positions[0], 0], dataset_scaled[1, positions[1], 0]], color='orange')
plt.legend()
plt.title("Time series matching with DTW")
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 3.105 seconds)
Note
Go to the end to download the full example code
LB_Keogh¶
This example illustrates the principle of time series envelope and its relationship to the “LB_Keogh” lower bound [1].
The envelope of a time series consists of two time series such that the original time series is between the two time series. Denoting the original time series \(X = (X_i)_{1 \leq i \leq n}\), the envelope of this time series is an ensemble of two time series of same length \(L = (l_i)_{1 \leq i \leq n}\) and \(U = (u_i)_{1 \leq i \leq n}\) such that for all \(i \in \{1, \ldots, n\}\):
where \(r\) is the radius of the envelope.
The distance between a time series $Q$ and an envelope \((L, U)\) is defined as:
So it is simply the Euclidean distance between \(Q\) and the envelope.
[1] E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping”. Knowledge and Information Systems, 7(3), 358-386 (2004).
# Author: Romain Tavenard
# Johann Faouzi
# License: BSD 3 clause
# sphinx_gallery_thumbnail_number = 2
import numpy
import matplotlib.pyplot as plt
from tslearn.generators import random_walks
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn import metrics
numpy.random.seed(0)
n_ts, sz, d = 2, 100, 1
dataset = random_walks(n_ts=n_ts, sz=sz, d=d)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.) # Rescale time series
dataset_scaled = scaler.fit_transform(dataset)
plt.figure(figsize=(14, 8))
envelope_down, envelope_up = metrics.lb_envelope(dataset_scaled[0], radius=3)
plt.plot(dataset_scaled[0, :, 0], "r-", label='First time series')
plt.plot(envelope_down[:, 0], "b-", label='Lower envelope')
plt.plot(envelope_up[:, 0], "g-", label='Upper envelope')
plt.legend()
plt.title('Envelope around a time series with radius=3')
plt.figure(figsize=(14, 8))
plt.plot(envelope_down[:, 0], "b-", label='Lower envelope')
plt.plot(envelope_up[:, 0], "g-", label='Upper envelope')
plt.plot(dataset_scaled[1, :, 0], "k-", label='Second time series')
plt.vlines(numpy.arange(sz), dataset_scaled[1, :, 0], numpy.clip(
dataset_scaled[1, :, 0], envelope_down[:, 0], envelope_up[:, 0]),
label='Distance', color='orange')
plt.legend()
lb_k_sim = metrics.lb_keogh(dataset_scaled[1],
envelope_candidate=(envelope_down, envelope_up))
plt.title('Distance between the second time series and \n'
'the envelope = {:.4f}'.format(lb_k_sim))
plt.show()
Total running time of the script: (0 minutes 1.365 seconds)
Note
Go to the end to download the full example code
Canonical Time Warping¶
This example illustrates the use of Canonical Time Warping (CTW) between time series and plots the matches obtained by the method [1].
Note that, contrary to Dynamic Time Warping (DTW) [2], CTW can almost retrieve the ground-truth alignment (green matches) even when time series have suffered a rigid transformation (rotation+translation here).
The time series at stake in this example are color-coded trajectories whose starting (resp. end) point are depicted in blue (resp. red).
F. Zhou and F. Torre, “Canonical time warping for alignment of human behavior”. NIPS 2009.
H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49 (1978).
# Author: Romain Tavenard
# License: BSD 3 clause
import matplotlib.pyplot as plt
import numpy as np
from tslearn.metrics import dtw_path, ctw_path
def plot_trajectory(ts, ax, color_code=None, alpha=1.):
if color_code is not None:
colors = [color_code] * len(ts)
else:
colors = plt.cm.jet(np.linspace(0, 1, len(ts)))
for i in range(len(ts) - 1):
ax.plot(ts[i:i+2, 0], ts[i:i+2, 1],
marker='o', c=colors[i], alpha=alpha)
def get_rot2d(theta):
return np.array(
[[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]]
)
def make_one_folium(sz, a=1., noise=.1, resample_fun=None):
theta = np.linspace(0, 1, sz)
if resample_fun is not None:
theta = resample_fun(theta)
theta -= .5
theta *= .9 * np.pi
theta = theta.reshape((-1, 1))
r = a / 2 * (4 * np.cos(theta) - 1. / np.cos(theta))
x = r * np.cos(theta) + np.random.rand(sz, 1) * noise
y = r * np.sin(theta) + np.random.rand(sz, 1) * noise
return np.array(np.hstack((x, y)))
trajectory = make_one_folium(sz=30).dot(get_rot2d(np.pi + np.pi / 3))
rotated_trajectory = trajectory.dot(get_rot2d(np.pi / 4)) + np.array([0., 3.])
path_dtw, _ = dtw_path(trajectory, rotated_trajectory)
path_ctw, cca, _ = ctw_path(trajectory, rotated_trajectory,
max_iter=100, n_components=2)
plt.figure(figsize=(8, 4))
ax = plt.subplot(1, 2, 1)
for (i, j) in path_dtw:
ax.plot([trajectory[i, 0], rotated_trajectory[j, 0]],
[trajectory[i, 1], rotated_trajectory[j, 1]],
color='g' if i == j else 'r', alpha=.5)
plot_trajectory(trajectory, ax)
plot_trajectory(rotated_trajectory, ax)
ax.set_xticks([])
ax.set_yticks([])
ax.set_title("DTW")
ax = plt.subplot(1, 2, 2)
for (i, j) in path_ctw:
ax.plot([trajectory[i, 0], rotated_trajectory[j, 0]],
[trajectory[i, 1], rotated_trajectory[j, 1]],
color='g' if i == j else 'r', alpha=.5)
plot_trajectory(trajectory, ax)
plot_trajectory(rotated_trajectory, ax)
ax.set_xticks([])
ax.set_yticks([])
ax.set_title("CTW")
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 0.386 seconds)
Note
Go to the end to download the full example code
sDTW multi path matching¶
This example illustrates how subsequent DTW can be used to find multiple matches of a sequence in a longer sequence.
A potential usecase is to identify the occurrence of certain events in continuous sensor signals. As one example Barth et al. [1] used this method to find stride in sensor recordings of gait.
The example demonstrates the use of the functions subsequence_cost_matrix and subsequence_path to manually calculate warping paths from multiple potential alignments. If you are only interested in finding the optimal alignment, you can directly use dtw_subsequence_path.
[1] Barth, et al. (2013): Subsequence dynamic time warping as a method for robust step segmentation using gyroscope signals of daily life activities, EMBS, https://doi.org/10.1109/EMBC.2013.6611104
Shape long sequence: (500, 1)
Shape short sequence: (100, 1)
# Author: Arne Kuederle
# License: BSD 3 clause
import matplotlib.pyplot as plt
import numpy
from scipy.signal import find_peaks
from tslearn import metrics
from tslearn.generators import random_walks
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
numpy.random.seed(0)
n_ts, sz, d = 2, 100, 1
n_repeat = 5
dataset = random_walks(n_ts=n_ts, sz=sz, d=d)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.) # Rescale time series
dataset_scaled = scaler.fit_transform(dataset)
# We repeat the long sequence multiple times to generate multiple possible
# matches
long_sequence = numpy.tile(dataset_scaled[1], (n_repeat, 1))
short_sequence = dataset_scaled[0]
sz1 = len(long_sequence)
sz2 = len(short_sequence)
print('Shape long sequence: {}'.format(long_sequence.shape))
print('Shape short sequence: {}'.format(short_sequence.shape))
# Calculate the accumulated cost matrix
mat = metrics.subsequence_cost_matrix(short_sequence,
long_sequence)
# Calculate cost function
cost_func = mat[-1, :]
# Identify potential matches in the cost function (parameters are tuned to
# fit this example)
potential_matches = find_peaks(-cost_func, distance=sz * 0.75, height=-50)[0]
# Calculate the optimal warping path starting from each of the identified
# minima
paths = [metrics.subsequence_path(mat, match) for match in
potential_matches]
plt.figure(1, figsize=(6 * n_repeat, 6))
# definitions for the axes
left, bottom = 0.01, 0.1
h_ts = 0.2
w_ts = h_ts / n_repeat
left_h = left + w_ts + 0.02
width = height = 0.65
bottom_h = bottom + height + 0.02
rect_s_y = [left, bottom, w_ts, height]
rect_gram = [left_h, bottom, width, height]
rect_s_x = [left_h, bottom_h, width, h_ts]
ax_gram = plt.axes(rect_gram)
ax_s_x = plt.axes(rect_s_x)
ax_s_y = plt.axes(rect_s_y)
ax_gram.imshow(numpy.sqrt(mat))
ax_gram.axis("off")
ax_gram.autoscale(False)
# Plot the paths
for path in paths:
ax_gram.plot([j for (i, j) in path], [i for (i, j) in path], "w-",
linewidth=3.)
ax_s_x.plot(numpy.arange(sz1), long_sequence, "b-", linewidth=3.)
ax_s_x.axis("off")
ax_s_x.set_xlim((0, sz1 - 1))
ax_s_y.plot(- short_sequence, numpy.arange(sz2)[::-1], "b-", linewidth=3.)
ax_s_y.axis("off")
ax_s_y.set_ylim((0, sz2 - 1))
plt.show()
Total running time of the script: (0 minutes 0.856 seconds)
Note
Go to the end to download the full example code
Longest Commom Subsequence with a custom distance metric¶
This example illustrates how to use the LCSS computation of the
alignment path [1] on an user-defined distance matrix using
dtw_path_from_metric()
.
The example is the LCSS of two angular time series using the length of the arc on the unit circle as a distance metric [2].
The image represent cost matrices, that is, the length of the arc between each pair of angles on the unit circle. The corresponding time series are represented at the left and at the top of each cost matrix.
The alignment path, that is the path where the matches between the two time-series occurred within the pre-defined threshold, is represented in white on the image.
M. Vlachos, D. Gunopoulos, and G. Kollios. 2002. “Discovering Similar Multidimensional Trajectories”, In Proceedings of the 18th International Conference on Data Engineering (ICDE ‘02). IEEE Computer Society, USA, 673.
Definition of the length of an arc on Wikipedia.
/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/latest/docs/examples/metrics/plot_lcss_custom_metric.py:119: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
plt.tight_layout()
# Author: Daniela Duarte
# License: BSD 3 clause
# sphinx_gallery_thumbnail_number = 2
import numpy as np
from numpy import pi
from sklearn.metrics import pairwise_distances
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from tslearn.generators import random_walks
from tslearn import metrics
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
np.random.seed(0)
n_ts, sz = 2, 100
# Example : Length of the arc between two angles on a circle
def arc_length(angle_1, angle_2, r=1.):
"""Length of the arc between two angles (in rad) on a circle of
radius r.
"""
# Compute the angle between the two inputs between 0 and 2*pi.
theta = np.mod(angle_2 - angle_1, 2*pi)
if theta > pi:
theta = theta - 2 * pi
# Return the length of the arc
L = r * np.abs(theta)
return(L)
dataset_1 = random_walks(n_ts=n_ts, sz=sz, d=1)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=pi) # Rescale the time series
dataset_scaled_1 = scaler.fit_transform(dataset_1)
# LCSS using a function as the metric argument
path_1, sim_1 = metrics.lcss_path_from_metric(
dataset_scaled_1[0], dataset_scaled_1[1], metric=arc_length
)
# Plots
# Compute the distance matrices for the plot
distances_1 = pairwise_distances(
dataset_scaled_1[0], dataset_scaled_1[1], metric=arc_length
)
# Definitions for the axes
left, bottom = 0.01, 0.1
w_ts = h_ts = 0.2
left_h = left + w_ts + 0.02
width = height = 0.65
bottom_h = bottom + height + 0.02
rect_s_y = [left, bottom, w_ts, height]
rect_dist = [left_h, bottom, width, height]
rect_s_x = [left_h, bottom_h, width, h_ts]
# Plot example
plt.figure(1, figsize=(6, 6))
ax_dist = plt.axes(rect_dist)
ax_s_x = plt.axes(rect_s_x)
ax_s_y = plt.axes(rect_s_y)
ax_dist.imshow(distances_1, origin='lower')
ax_dist.axis("off")
ax_dist.autoscale(False)
ax_dist.plot(*zip(*path_1), "w-", linewidth=3.)
ticks_location = [-pi, 0, pi]
ticks_labels = [r"$\bf-\pi$", r"$\bf0$", r"$\bf\pi$"]
ax_s_x.plot([0, sz - 1], [ticks_location]*2, "k--", alpha=.2)
ax_s_x.plot(np.arange(sz), dataset_scaled_1[1], "b-", linewidth=3.)
ax_s_x.set_xlim((0, sz - 1))
ax_s_x.axis("off")
ax_s_y.plot([ticks_location]*2, [0, sz - 1], "k--", alpha=.2)
ax_s_y.plot(-dataset_scaled_1[0], np.arange(sz), "b-", linewidth=3.)
ax_s_y.set_ylim((0, sz - 1))
ax_s_y.axis("off")
for loc, s in zip(ticks_location, ticks_labels):
ax_s_x.text(0, loc, s, fontsize="large", color="grey",
horizontalalignment="right", verticalalignment="center")
ax_s_y.text(-loc, 0, s, fontsize="large", color="grey",
horizontalalignment="center", verticalalignment="top")
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 1.249 seconds)
Note
Go to the end to download the full example code
Dynamic Time Warping¶
This example illustrates Dynamic Time Warping (DTW) computation between time series and plots the optimal alignment path [1].
The image represents cost matrix, that is the squared Euclidean distance for each time point between both time series, which are represented at the left and at the top of the cost matrix.
The optimal path, that is the path that minimizes the total cost to go from the first time point to the last one, is represented in white on the image.
H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49 (1978).
/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/latest/docs/examples/metrics/plot_dtw.py:103: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
plt.tight_layout()
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
from tslearn import metrics
numpy.random.seed(0)
s_x = numpy.array(
[-0.790, -0.765, -0.734, -0.700, -0.668, -0.639, -0.612, -0.587, -0.564,
-0.544, -0.529, -0.518, -0.509, -0.502, -0.494, -0.488, -0.482, -0.475,
-0.472, -0.470, -0.465, -0.464, -0.461, -0.458, -0.459, -0.460, -0.459,
-0.458, -0.448, -0.431, -0.408, -0.375, -0.333, -0.277, -0.196, -0.090,
0.047, 0.220, 0.426, 0.671, 0.962, 1.300, 1.683, 2.096, 2.510, 2.895,
3.219, 3.463, 3.621, 3.700, 3.713, 3.677, 3.606, 3.510, 3.400, 3.280,
3.158, 3.038, 2.919, 2.801, 2.676, 2.538, 2.382, 2.206, 2.016, 1.821,
1.627, 1.439, 1.260, 1.085, 0.917, 0.758, 0.608, 0.476, 0.361, 0.259,
0.173, 0.096, 0.027, -0.032, -0.087, -0.137, -0.179, -0.221, -0.260,
-0.293, -0.328, -0.359, -0.385, -0.413, -0.437, -0.458, -0.480, -0.498,
-0.512, -0.526, -0.536, -0.544, -0.552, -0.556, -0.561, -0.565, -0.568,
-0.570, -0.570, -0.566, -0.560, -0.549, -0.532, -0.510, -0.480, -0.443,
-0.402, -0.357, -0.308, -0.256, -0.200, -0.139, -0.073, -0.003, 0.066,
0.131, 0.186, 0.229, 0.259, 0.276, 0.280, 0.272, 0.256, 0.234, 0.209,
0.186, 0.162, 0.139, 0.112, 0.081, 0.046, 0.008, -0.032, -0.071, -0.110,
-0.147, -0.180, -0.210, -0.235, -0.256, -0.275, -0.292, -0.307, -0.320,
-0.332, -0.344, -0.355, -0.363, -0.367, -0.364, -0.351, -0.330, -0.299,
-0.260, -0.217, -0.172, -0.128, -0.091, -0.060, -0.036, -0.022, -0.016,
-0.020, -0.037, -0.065, -0.104, -0.151, -0.201, -0.253, -0.302, -0.347,
-0.388, -0.426, -0.460, -0.491, -0.517, -0.539, -0.558, -0.575, -0.588,
-0.600, -0.606, -0.607, -0.604, -0.598, -0.589, -0.577, -0.558, -0.531,
-0.496, -0.454, -0.410, -0.364, -0.318, -0.276, -0.237, -0.203, -0.176,
-0.157, -0.145, -0.142, -0.145, -0.154, -0.168, -0.185, -0.206, -0.230,
-0.256, -0.286, -0.318, -0.351, -0.383, -0.414, -0.442, -0.467, -0.489,
-0.508, -0.523, -0.535, -0.544, -0.552, -0.557, -0.560, -0.560, -0.557,
-0.551, -0.542, -0.531, -0.519, -0.507, -0.494, -0.484, -0.476, -0.469,
-0.463, -0.456, -0.449, -0.442, -0.435, -0.431, -0.429, -0.430, -0.435,
-0.442, -0.452, -0.465, -0.479, -0.493, -0.506, -0.517, -0.526, -0.535,
-0.548, -0.567, -0.592, -0.622, -0.655, -0.690, -0.728, -0.764, -0.795,
-0.815, -0.823, -0.821])
s_y1 = numpy.concatenate((s_x, s_x)).reshape((-1, 1))
s_y2 = numpy.concatenate((s_x, s_x[::-1])).reshape((-1, 1))
sz = s_y1.shape[0]
path, sim = metrics.dtw_path(s_y1, s_y2)
plt.figure(1, figsize=(8, 8))
# definitions for the axes
left, bottom = 0.01, 0.1
w_ts = h_ts = 0.2
left_h = left + w_ts + 0.02
width = height = 0.65
bottom_h = bottom + height + 0.02
rect_s_y = [left, bottom, w_ts, height]
rect_gram = [left_h, bottom, width, height]
rect_s_x = [left_h, bottom_h, width, h_ts]
ax_gram = plt.axes(rect_gram)
ax_s_x = plt.axes(rect_s_x)
ax_s_y = plt.axes(rect_s_y)
mat = cdist(s_y1, s_y2)
ax_gram.imshow(mat, origin='lower')
ax_gram.axis("off")
ax_gram.autoscale(False)
ax_gram.plot([j for (i, j) in path], [i for (i, j) in path], "w-",
linewidth=3.)
ax_s_x.plot(numpy.arange(sz), s_y2, "b-", linewidth=3.)
ax_s_x.axis("off")
ax_s_x.set_xlim((0, sz - 1))
ax_s_y.plot(- s_y1, numpy.arange(sz), "b-", linewidth=3.)
ax_s_y.axis("off")
ax_s_y.set_ylim((0, sz - 1))
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 0.259 seconds)
Note
Go to the end to download the full example code
Soft Dynamic Time Warping¶
This example illustrates Soft Dynamic Time Warping (DTW) computation between time series and plots the optimal soft alignment matrices [1].
M. Cuturi, M. Blondel “Soft-DTW: a Differentiable Loss Function for Time-Series,” ICML 2017.
# Author: Romain Tavenard
# License: BSD 3 clause
# sphinx_gallery_thumbnail_number = 3
import numpy
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
from tslearn import metrics
numpy.random.seed(0)
s_x = numpy.array(
[-0.790, -0.765, -0.734, -0.700, -0.668, -0.639, -0.612, -0.587, -0.564,
-0.544, -0.529, -0.518, -0.509, -0.502, -0.494, -0.488, -0.482, -0.475,
-0.472, -0.470, -0.465, -0.464, -0.461, -0.458, -0.459, -0.460, -0.459,
-0.458, -0.448, -0.431, -0.408, -0.375, -0.333, -0.277, -0.196, -0.090,
0.047, 0.220, 0.426, 0.671, 0.962, 1.300, 1.683, 2.096, 2.510, 2.895,
3.219, 3.463, 3.621, 3.700, 3.713, 3.677, 3.606, 3.510, 3.400, 3.280,
3.158, 3.038, 2.919, 2.801, 2.676, 2.538, 2.382, 2.206, 2.016, 1.821,
1.627, 1.439, 1.260, 1.085, 0.917, 0.758, 0.608, 0.476, 0.361, 0.259,
0.173, 0.096, 0.027, -0.032, -0.087, -0.137, -0.179, -0.221, -0.260,
-0.293, -0.328, -0.359, -0.385, -0.413, -0.437, -0.458, -0.480, -0.498,
-0.512, -0.526, -0.536, -0.544, -0.552, -0.556, -0.561, -0.565, -0.568,
-0.570, -0.570, -0.566, -0.560, -0.549, -0.532, -0.510, -0.480, -0.443,
-0.402, -0.357, -0.308, -0.256, -0.200, -0.139, -0.073, -0.003, 0.066,
0.131, 0.186, 0.229, 0.259, 0.276, 0.280, 0.272, 0.256, 0.234, 0.209,
0.186, 0.162, 0.139, 0.112, 0.081, 0.046, 0.008, -0.032, -0.071, -0.110,
-0.147, -0.180, -0.210, -0.235, -0.256, -0.275, -0.292, -0.307, -0.320,
-0.332, -0.344, -0.355, -0.363, -0.367, -0.364, -0.351, -0.330, -0.299,
-0.260, -0.217, -0.172, -0.128, -0.091, -0.060, -0.036, -0.022, -0.016,
-0.020, -0.037, -0.065, -0.104, -0.151, -0.201, -0.253, -0.302, -0.347,
-0.388, -0.426, -0.460, -0.491, -0.517, -0.539, -0.558, -0.575, -0.588,
-0.600, -0.606, -0.607, -0.604, -0.598, -0.589, -0.577, -0.558, -0.531,
-0.496, -0.454, -0.410, -0.364, -0.318, -0.276, -0.237, -0.203, -0.176,
-0.157, -0.145, -0.142, -0.145, -0.154, -0.168, -0.185, -0.206, -0.230,
-0.256, -0.286, -0.318, -0.351, -0.383, -0.414, -0.442, -0.467, -0.489,
-0.508, -0.523, -0.535, -0.544, -0.552, -0.557, -0.560, -0.560, -0.557,
-0.551, -0.542, -0.531, -0.519, -0.507, -0.494, -0.484, -0.476, -0.469,
-0.463, -0.456, -0.449, -0.442, -0.435, -0.431, -0.429, -0.430, -0.435,
-0.442, -0.452, -0.465, -0.479, -0.493, -0.506, -0.517, -0.526, -0.535,
-0.548, -0.567, -0.592, -0.622, -0.655, -0.690, -0.728, -0.764, -0.795,
-0.815, -0.823, -0.821])
s_y1 = numpy.concatenate((s_x, s_x))[::2].reshape((-1, 1))
s_y2 = numpy.concatenate((s_x, s_x[::-1]))[::2].reshape((-1, 1))
sz = s_y1.shape[0]
for gamma in [0., .1, 1.]:
alignment, sim = metrics.soft_dtw_alignment(s_y1, s_y2, gamma=gamma)
plt.figure(figsize=(8, 8))
# definitions for the axes
left, bottom = 0.01, 0.1
w_ts = h_ts = 0.2
left_h = left + w_ts + 0.02
width = height = 0.65
bottom_h = bottom + height + 0.02
rect_s_y = [left, bottom, w_ts, height]
rect_gram = [left_h, bottom, width, height]
rect_s_x = [left_h, bottom_h, width, h_ts]
ax_gram = plt.axes(rect_gram)
ax_s_x = plt.axes(rect_s_x)
ax_s_y = plt.axes(rect_s_y)
mat = cdist(s_y1, s_y2)
ax_gram.imshow(alignment, origin='lower')
ax_gram.axis("off")
ax_gram.autoscale(False)
plt.suptitle("$\\gamma={:.1f}$".format(gamma), fontsize=24)
ax_s_x.plot(numpy.arange(sz), s_y2, "b-", linewidth=3.)
ax_s_x.axis("off")
ax_s_x.set_xlim((0, sz - 1))
ax_s_y.plot(- s_y1, numpy.arange(sz), "b-", linewidth=3.)
ax_s_y.axis("off")
ax_s_y.set_ylim((0, sz - 1))
plt.show()
Total running time of the script: (0 minutes 1.152 seconds)
Note
Go to the end to download the full example code
DTW computation with a custom distance metric¶
This example illustrates how to use the DTW computation of the optimal
alignment path [1] on an user-defined distance matrix using
dtw_path_from_metric()
.
Left is the DTW of two angular time series using the length of the arc on the unit circle as a distance metric [2] and right is the DTW of two multidimensional boolean time series using hamming distance [3].
The images represent cost matrices, that is, on the left the length of the arc between each pair of angles on the unit circle and on the right the hamming distances between the multidimensional boolean arrays. In both cases, the corresponding time series are represented at the left and at the top of each cost matrix.
The optimal path, that is the path that minimizes the total user-defined cost from the first time point to the last one, is represented in white on the image.
H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49 (1978).
Definition of the length of an arc on Wikipedia.
See Hammig distance in Scipy’s documentation.
/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/latest/docs/examples/metrics/plot_dtw_custom_metric.py:156: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
plt.tight_layout()
# Author: Romain Fayat
# License: BSD 3 clause
# sphinx_gallery_thumbnail_number = 2
import numpy as np
from numpy import pi
from sklearn.metrics import pairwise_distances
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from tslearn.generators import random_walks
from tslearn import metrics
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
np.random.seed(0)
n_ts, sz = 2, 100
# Example 1 : Length of the arc between two angles on a circle
def arc_length(angle_1, angle_2, r=1.):
"""Length of the arc between two angles (in rad) on a circle of
radius r.
"""
# Compute the angle between the two inputs between 0 and 2*pi.
theta = np.mod(angle_2 - angle_1, 2*pi)
if theta > pi:
theta = theta - 2 * pi
# Return the length of the arc
L = r * np.abs(theta)
return(L)
dataset_1 = random_walks(n_ts=n_ts, sz=sz, d=1)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=pi) # Rescale the time series
dataset_scaled_1 = scaler.fit_transform(dataset_1)
# DTW using a function as the metric argument
path_1, sim_1 = metrics.dtw_path_from_metric(
dataset_scaled_1[0], dataset_scaled_1[1], metric=arc_length
)
# Example 2 : Hamming distance between 2 multi-dimensional boolean time series
rw = random_walks(n_ts=n_ts, sz=sz, d=15, std=.3)
dataset_2 = np.mod(np.floor(rw), 4) == 0
# DTW using one of the options of sklearn.metrics.pairwise_distances
path_2, sim_2 = metrics.dtw_path_from_metric(
dataset_2[0], dataset_2[1], metric="hamming"
)
# Plots
# Compute the distance matrices for the plots
distances_1 = pairwise_distances(
dataset_scaled_1[0], dataset_scaled_1[1], metric=arc_length
)
distances_2 = pairwise_distances(dataset_2[0], dataset_2[1], metric="hamming")
# Definitions for the axes
left, bottom = 0.01, 0.1
w_ts = h_ts = 0.2
left_h = left + w_ts + 0.02
width = height = 0.65
bottom_h = bottom + height + 0.02
rect_s_y = [left, bottom, w_ts, height]
rect_dist = [left_h, bottom, width, height]
rect_s_x = [left_h, bottom_h, width, h_ts]
# Plot example 1
plt.figure(1, figsize=(6, 6))
ax_dist = plt.axes(rect_dist)
ax_s_x = plt.axes(rect_s_x)
ax_s_y = plt.axes(rect_s_y)
ax_dist.imshow(distances_1, origin='lower')
ax_dist.axis("off")
ax_dist.autoscale(False)
ax_dist.plot(*zip(*path_1), "w-", linewidth=3.)
ticks_location = [-pi, 0, pi]
ticks_labels = [r"$\bf-\pi$", r"$\bf0$", r"$\bf\pi$"]
ax_s_x.plot([0, sz - 1], [ticks_location]*2, "k--", alpha=.2)
ax_s_x.plot(np.arange(sz), dataset_scaled_1[1], "b-", linewidth=3.)
ax_s_x.set_xlim((0, sz - 1))
ax_s_x.axis("off")
ax_s_y.plot([ticks_location]*2, [0, sz - 1], "k--", alpha=.2)
ax_s_y.plot(-dataset_scaled_1[0], np.arange(sz), "b-", linewidth=3.)
ax_s_y.set_ylim((0, sz - 1))
ax_s_y.axis("off")
for loc, s in zip(ticks_location, ticks_labels):
ax_s_x.text(0, loc, s, fontsize="large", color="grey",
horizontalalignment="right", verticalalignment="center")
ax_s_y.text(-loc, 0, s, fontsize="large", color="grey",
horizontalalignment="center", verticalalignment="top")
# Plot example 2
plt.figure(2, figsize=(6, 6))
ax_dist = plt.axes(rect_dist)
ax_s_x = plt.axes(rect_s_x)
ax_s_y = plt.axes(rect_s_y)
ax_dist.imshow(distances_2, origin='lower')
ax_dist.axis("off")
ax_dist.autoscale(False)
ax_dist.plot(*zip(*path_2), "w-", linewidth=3.)
colors = [(1, 1, 1), (0, 0, 1)] # White -> Blue
cmap_name = 'white_blue'
cm = LinearSegmentedColormap.from_list(cmap_name, colors, N=2)
ax_s_x.imshow(dataset_2[1].T, aspect="auto", cmap=cm)
ax_s_x.axis("off")
ax_s_y.imshow(np.flip(dataset_2[0], axis=1), aspect="auto", cmap=cm)
ax_s_y.axis("off")
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 0.636 seconds)
Nearest Neighbors¶
Hyper-parameter tuning of a Pipeline with KNeighborsTimeSeriesClassifier
Note
Go to the end to download the full example code
k-NN search¶
This example performs a \(k\)-Nearest-Neighbor search in a database of time series using DTW as a base metric.
To do so, we use the tslearn.neighbors.KNeighborsTimeSeries
class
which provides utilities for the \(k\)-Nearest-Neighbor algorithm
for time series.
[1] Wikipedia entry for the k-nearest neighbors algorithm
[2] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49 (1978).
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.neighbors import KNeighborsTimeSeries
from tslearn.datasets import CachedDatasets
seed = 0
numpy.random.seed(seed)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
n_queries = 2
n_neighbors = 4
knn = KNeighborsTimeSeries(n_neighbors=n_neighbors)
knn.fit(X_train)
ind = knn.kneighbors(X_test[:n_queries], return_distance=False)
plt.figure()
for idx_ts in range(n_queries):
plt.subplot(n_neighbors + 1, n_queries, idx_ts + 1)
plt.plot(X_test[idx_ts].ravel(), "k-")
plt.xticks([])
for rank_nn in range(n_neighbors):
plt.subplot(n_neighbors + 1, n_queries,
idx_ts + (n_queries * (rank_nn + 1)) + 1)
plt.plot(X_train[ind[idx_ts, rank_nn]].ravel(), "r-")
plt.xticks([])
plt.suptitle("Queries (in black) and their nearest neighbors (red)")
plt.show()
Total running time of the script: (0 minutes 1.940 seconds)
Note
Go to the end to download the full example code
Nearest neighbors¶
This example illustrates the use of nearest neighbor methods for database search and classification tasks.
The three-nearest neighbors of the time series from a test set are computed. Then, the predictive performance of a three-nearest neighbors classifier [1] is computed with three different metrics: Dynamic Time Warping [2], Euclidean distance and SAX-MINDIST [3].
[1] Wikipedia entry for the k-nearest neighbors algorithm
[2] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49 (1978).
[3] J. Lin, E. Keogh, L. Wei and S. Lonardi, “Experiencing SAX: a novel symbolic representation of time series”. Data Mining and Knowledge Discovery, 15(2), 107-144 (2007).
1. Nearest neighbour search
Computed nearest neighbor indices (wrt DTW)
[[10 12 2]
[ 0 13 5]
[ 0 1 13]
[ 0 11 5]
[16 18 12]
[ 3 17 9]
[12 2 16]
[ 7 3 17]
[12 2 10]
[12 2 18]
[12 8 2]
[ 3 17 7]
[18 19 2]
[ 0 17 13]
[ 9 3 7]
[12 2 8]
[ 3 7 9]
[ 0 1 13]
[18 10 2]
[10 12 2]]
First nearest neighbor class: [0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0]
2. Nearest neighbor classification using DTW
Correct classification rate: 1.0
3. Nearest neighbor classification using L2
Correct classification rate: 1.0
4. Nearest neighbor classification using SAX+MINDIST
Correct classification rate: 0.5
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
from sklearn.metrics import accuracy_score
from tslearn.generators import random_walk_blobs
from tslearn.preprocessing import TimeSeriesScalerMinMax, \
TimeSeriesScalerMeanVariance
from tslearn.neighbors import KNeighborsTimeSeriesClassifier, \
KNeighborsTimeSeries
numpy.random.seed(0)
n_ts_per_blob, sz, d, n_blobs = 20, 100, 1, 2
# Prepare data
X, y = random_walk_blobs(n_ts_per_blob=n_ts_per_blob,
sz=sz,
d=d,
n_blobs=n_blobs)
scaler = TimeSeriesScalerMinMax(value_range=(0., 1.)) # Rescale time series
X_scaled = scaler.fit_transform(X)
indices_shuffle = numpy.random.permutation(n_ts_per_blob * n_blobs)
X_shuffle = X_scaled[indices_shuffle]
y_shuffle = y[indices_shuffle]
X_train = X_shuffle[:n_ts_per_blob * n_blobs // 2]
X_test = X_shuffle[n_ts_per_blob * n_blobs // 2:]
y_train = y_shuffle[:n_ts_per_blob * n_blobs // 2]
y_test = y_shuffle[n_ts_per_blob * n_blobs // 2:]
# Nearest neighbor search
knn = KNeighborsTimeSeries(n_neighbors=3, metric="dtw")
knn.fit(X_train, y_train)
dists, ind = knn.kneighbors(X_test)
print("1. Nearest neighbour search")
print("Computed nearest neighbor indices (wrt DTW)\n", ind)
print("First nearest neighbor class:", y_test[ind[:, 0]])
# Nearest neighbor classification
knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="dtw")
knn_clf.fit(X_train, y_train)
predicted_labels = knn_clf.predict(X_test)
print("\n2. Nearest neighbor classification using DTW")
print("Correct classification rate:", accuracy_score(y_test, predicted_labels))
# Nearest neighbor classification with a different metric (Euclidean distance)
knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="euclidean")
knn_clf.fit(X_train, y_train)
predicted_labels = knn_clf.predict(X_test)
print("\n3. Nearest neighbor classification using L2")
print("Correct classification rate:", accuracy_score(y_test, predicted_labels))
# Nearest neighbor classification based on SAX representation
metric_params = {'n_segments': 10, 'alphabet_size_avg': 5}
knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="sax",
metric_params=metric_params)
knn_clf.fit(X_train, y_train)
predicted_labels = knn_clf.predict(X_test)
print("\n4. Nearest neighbor classification using SAX+MINDIST")
print("Correct classification rate:", accuracy_score(y_test, predicted_labels))
Total running time of the script: (0 minutes 2.766 seconds)
Note
Go to the end to download the full example code
Hyper-parameter tuning of a Pipeline with KNeighborsTimeSeriesClassifier¶
In this example, we demonstrate how it is possible to use the different algorithms of tslearn in combination with sklearn utilities, such as the sklearn.pipeline.Pipeline and sklearn.model_selection.GridSearchCV. In this specific example, we will tune two of the hyper-parameters of a KNeighborsTimeSeriesClassifier.
Performing hyper-parameter tuning of KNN classifier... Done!
Got the following accuracies on the test set for each fold:
|n_neighbors | weights |score_fold_1|score_fold_2|score_fold_3|
-----------------------------------------------------------------
| 5| uniform| 0.64706| 0.82353| 0.6875|
| 5| distance| 0.70588| 0.88235| 0.8125|
| 25| uniform| 0.64706| 0.64706| 0.625|
| 25| distance| 0.82353| 0.76471| 0.8125|
Best parameter combination:
weights=distance, n_neighbors=5
# Author: Gilles Vandewiele
# License: BSD 3 clause
from tslearn.neighbors import KNeighborsTimeSeriesClassifier
from tslearn.preprocessing import TimeSeriesScalerMinMax
from tslearn.datasets import CachedDatasets
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.pipeline import Pipeline
import numpy as np
import matplotlib.pyplot as plt
# Our pipeline consists of two phases. First, data will be normalized using
# min-max normalization. Afterwards, it is fed to a KNN classifier. For the
# KNN classifier, we tune the n_neighbors and weights hyper-parameters.
n_splits = 3
pipeline = GridSearchCV(
Pipeline([
('normalize', TimeSeriesScalerMinMax()),
('knn', KNeighborsTimeSeriesClassifier())
]),
{'knn__n_neighbors': [5, 25], 'knn__weights': ['uniform', 'distance']},
cv=StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
)
X_train, y_train, _, _ = CachedDatasets().load_dataset("Trace")
# Keep only timeseries of class 1, 2, 3
X_train = X_train[y_train > 0]
y_train = y_train[y_train > 0]
# Keep only the first 50 timeseries of both train and
# retain only a small amount of each of the timeseries
X_train, y_train = X_train[:50, 50:150], y_train[:50]
# Plot our timeseries
colors = ['g', 'b', 'r']
plt.figure()
for ts, label in zip(X_train, y_train):
plt.plot(ts, c=colors[label - 2], alpha=0.5)
plt.title('The timeseries in the dataset')
plt.tight_layout()
plt.show()
# Fit our pipeline
print(end='Performing hyper-parameter tuning of KNN classifier... ')
pipeline.fit(X_train, y_train)
results = pipeline.cv_results_
# Print each possible configuration parameter and the out-of-fold accuracies
print('Done!')
print()
print('Got the following accuracies on the test set for each fold:')
header_str = '|'
columns = ['n_neighbors', 'weights']
columns += ['score_fold_{}'.format(i + 1) for i in range(n_splits)]
for col in columns:
header_str += '{:^12}|'.format(col)
print(header_str)
print('-'*(len(columns) * 13))
for i in range(len(results['params'])):
s = '|'
s += '{:>12}|'.format(results['params'][i]['knn__n_neighbors'])
s += '{:>12}|'.format(results['params'][i]['knn__weights'])
for k in range(n_splits):
score = results['split{}_test_score'.format(k)][i]
score = np.around(score, 5)
s += '{:>12}|'.format(score)
print(s.strip())
best_comb = np.argmax(results['mean_test_score'])
best_params = results['params'][best_comb]
print()
print('Best parameter combination:')
print('weights={}, n_neighbors={}'.format(best_params['knn__weights'],
best_params['knn__n_neighbors']))
Total running time of the script: (0 minutes 12.886 seconds)
Note
Go to the end to download the full example code
1-NN with SAX + MINDIST¶
This example presents a comparison between k-Nearest Neighbor runs with k=1. It compares the use of: * MINDIST (see [1]) on SAX representations of the data. * Euclidean distance on the raw values of the time series.
The comparison is based on test accuracy using several benchmark datasets.
- [1] Lin, Jessica, et al. “Experiencing SAX: a novel symbolic
representation of time series.” Data Mining and knowledge discovery 15.2 (2007): 107-144.
| dataset | sax error | sax time | eucl error | eucl time |
--------------------------------------------------------------------------
| SyntheticControl| 0.03| 3.48727| 0.12| 1.03348|
| GunPoint| 0.20667| 1.88121| 0.08667| 0.73752|
| FaceFour| 0.14773| 2.17096| 0.21591| 0.90353|
| Lightning2| 0.19672| 3.92335| 0.2459| 1.71236|
| Lightning7| 0.46575| 2.47131| 0.42466| 1.06485|
| ECG200| 0.12| 1.26332| 0.12| 0.50789|
| Plane| 0.04762| 1.89187| 0.0381| 0.74948|
| Car| 0.35| 3.61388| 0.26667| 1.54936|
| Beef| 0.53333| 1.61011| 0.33333| 0.65551|
| Coffee| 0.46429| 0.87851| 0.0| 0.37575|
| OliveOil| 0.83333| 1.79912| 0.13333| 0.77401|
--------------------------------------------------------------------------
# Author: Gilles Vandewiele
# License: BSD 3 clause
import warnings
import time
import numpy
import matplotlib.pyplot as plt
from scipy.stats import norm
from tslearn.datasets import UCR_UEA_datasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.neighbors import KNeighborsTimeSeriesClassifier
from sklearn.base import clone
from sklearn.metrics import pairwise_distances, accuracy_score
from sklearn.neighbors import KNeighborsClassifier
warnings.filterwarnings('ignore')
def print_table(accuracies, times):
"""Utility function to pretty print the obtained accuracies"""
header_str = '|'
header_str += '{:^20}|'.format('dataset')
columns = ['sax error', 'sax time', 'eucl error', 'eucl time']
for col in columns:
header_str += '{:^12}|'.format(col)
print(header_str)
print('-'*(len(columns) * 13 + 22))
for dataset in accuracies:
acc_sax, acc_euclidean = accuracies[dataset]
time_sax, time_euclidean = times[dataset]
sax_error = numpy.around(1 - acc_sax, 5)
eucl_error = numpy.around(1 - acc_euclidean, 5)
time_sax = numpy.around(time_sax, 5)
time_euclidean = numpy.around(time_euclidean, 5)
s = '|'
s += '{:>20}|'.format(dataset)
s += '{:>12}|'.format(sax_error)
s += '{:>12}|'.format(time_sax)
s += '{:>12}|'.format(eucl_error)
s += '{:>12}|'.format(time_euclidean)
print(s.strip())
print('-'*(len(columns) * 13 + 22))
# Set seed
numpy.random.seed(0)
# Defining dataset and the number of segments
data_loader = UCR_UEA_datasets()
datasets = [
('SyntheticControl', 16),
('GunPoint', 64),
('FaceFour', 128),
('Lightning2', 256),
('Lightning7', 128),
('ECG200', 32),
('Plane', 64),
('Car', 256),
('Beef', 128),
('Coffee', 128),
('OliveOil', 256)
]
# We will compare the accuracies & execution times of 1-NN using:
# (i) MINDIST on SAX representations, and
# (ii) euclidean distance on raw values
knn_sax = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric='sax')
knn_eucl = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric='euclidean')
accuracies = {}
times = {}
for dataset, w in datasets:
X_train, y_train, X_test, y_test = data_loader.load_dataset(dataset)
ts_scaler = TimeSeriesScalerMeanVariance()
X_train = ts_scaler.fit_transform(X_train)
X_test = ts_scaler.fit_transform(X_test)
# Fit 1-NN using SAX representation & MINDIST
metric_params = {'n_segments': w, 'alphabet_size_avg': 10}
knn_sax = clone(knn_sax).set_params(metric_params=metric_params)
start = time.time()
knn_sax.fit(X_train, y_train)
acc_sax = accuracy_score(y_test, knn_sax.predict(X_test))
time_sax = time.time() - start
# Fit 1-NN using euclidean distance on raw values
start = time.time()
knn_eucl.fit(X_train, y_train)
acc_euclidean = accuracy_score(y_test, knn_eucl.predict(X_test))
time_euclidean = time.time() - start
accuracies[dataset] = (acc_sax, acc_euclidean)
times[dataset] = (time_sax, time_euclidean)
print_table(accuracies, times)
Total running time of the script: (0 minutes 58.611 seconds)
Clustering and Barycenters¶
Note
Go to the end to download the full example code
KShape¶
This example uses the KShape clustering method [1] that is based on cross-correlation to cluster time series.
[1] J. Paparrizos & L. Gravano. k-Shape: Efficient and Accurate Clustering of Time Series. SIGMOD 2015. pp. 1855-1870.
0.008 --> 0.006 --> 0.004 --> 0.004 --> 0.004 --> 0.003 --> 0.003 --> 0.003 --> 0.003 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 -->
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.clustering import KShape
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
seed = 0
numpy.random.seed(seed)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
# Keep first 3 classes and 50 first time series
X_train = X_train[y_train < 4]
X_train = X_train[:50]
numpy.random.shuffle(X_train)
# For this method to operate properly, prior scaling is required
X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train)
sz = X_train.shape[1]
# kShape clustering
ks = KShape(n_clusters=3, verbose=True, random_state=seed)
y_pred = ks.fit_predict(X_train)
plt.figure()
for yi in range(3):
plt.subplot(3, 1, 1 + yi)
for xx in X_train[y_pred == yi]:
plt.plot(xx.ravel(), "k-", alpha=.2)
plt.plot(ks.cluster_centers_[yi].ravel(), "r-")
plt.xlim(0, sz)
plt.ylim(-4, 4)
plt.title("Cluster %d" % (yi + 1))
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 26.418 seconds)
Note
Go to the end to download the full example code
Kernel k-means¶
This example uses Global Alignment kernel (GAK, [1]) at the core of a kernel \(k\)-means algorithm [2] to perform time series clustering.
Note that, contrary to \(k\)-means, a centroid cannot be computed when using kernel \(k\)-means. However, one can still report cluster assignments, which is what is provided here: each subfigure represents the set of time series from the training set that were assigned to the considered cluster.
[1] M. Cuturi, “Fast global alignment kernels,” ICML 2011.
[2] I. S. Dhillon, Y. Guan, B. Kulis. Kernel k-means, Spectral Clustering and Normalized Cuts. KDD 2004.
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.5s
[Parallel(n_jobs=1)]: Done 199 tasks | elapsed: 1.2s
[Parallel(n_jobs=1)]: Done 449 tasks | elapsed: 2.3s
[Parallel(n_jobs=1)]: Done 799 tasks | elapsed: 3.8s
[Parallel(n_jobs=1)]: Done 1249 tasks | elapsed: 5.7s
Init 1
80.948 --> 70.106 --> 66.011 --> 63.422 --> 59.720 --> 58.005 --> 57.563 --> 57.563 -->
Init 2
80.519 --> 70.023 --> 66.522 --> 65.914 --> 65.914 -->
Init 3
80.374 --> 67.064 --> 62.859 --> 62.220 --> 59.391 --> 59.391 -->
Init 4
77.700 --> 69.585 --> 67.474 --> 67.022 --> 66.104 --> 65.075 --> 63.516 --> 62.861 --> 62.410 --> 61.166 --> 59.759 --> 59.759 -->
Init 5
79.246 --> 66.190 --> 63.040 --> 63.040 -->
Init 6
78.590 --> 68.315 --> 66.321 --> 65.633 --> 63.898 --> 63.898 -->
Init 7
75.299 --> 63.203 --> 59.963 --> 57.563 --> 57.563 -->
Init 8
76.876 --> 67.042 --> 66.764 --> 66.764 -->
Init 9
81.317 --> 69.313 --> 63.927 --> 61.124 --> 59.391 --> 59.391 -->
Init 10
79.317 --> 72.390 --> 70.197 --> 70.218 --> 70.218 -->
Init 11
78.202 --> 66.888 --> 60.961 --> 57.946 --> 57.387 --> 57.387 -->
Init 12
78.194 --> 67.992 --> 65.263 --> 63.436 --> 61.177 --> 57.799 --> 57.387 --> 57.387 -->
Init 13
77.553 --> 64.028 --> 64.008 --> 64.008 -->
Init 14
77.853 --> 62.815 --> 57.799 --> 57.387 --> 57.387 -->
Init 15
81.746 --> 67.617 --> 63.332 --> 62.827 --> 62.234 --> 58.470 --> 57.387 --> 57.387 -->
Init 16
78.934 --> 69.153 --> 65.466 --> 63.619 --> 63.619 -->
Init 17
78.303 --> 65.546 --> 63.619 --> 63.619 -->
Init 18
77.760 --> 67.020 --> 66.729 --> 65.900 --> 65.900 -->
Init 19
79.795 --> 70.429 --> 69.098 --> 69.098 -->
Init 20
79.419 --> 67.908 --> 65.330 --> 63.388 --> 61.019 --> 58.186 --> 57.387 --> 57.387 -->
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.clustering import KernelKMeans
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
seed = 0
numpy.random.seed(seed)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
# Keep first 3 classes
X_train = X_train[y_train < 4]
numpy.random.shuffle(X_train)
# Keep only 50 time series
X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train[:50])
sz = X_train.shape[1]
gak_km = KernelKMeans(n_clusters=3,
kernel="gak",
kernel_params={"sigma": "auto"},
n_init=20,
verbose=True,
random_state=seed)
y_pred = gak_km.fit_predict(X_train)
plt.figure()
for yi in range(3):
plt.subplot(3, 1, 1 + yi)
for xx in X_train[y_pred == yi]:
plt.plot(xx.ravel(), "k-", alpha=.2)
plt.xlim(0, sz)
plt.ylim(-4, 4)
plt.title("Cluster %d" % (yi + 1))
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 7.585 seconds)
Note
Go to the end to download the full example code
Barycenters¶
This example shows three methods to compute barycenters of time series.
For an overview over the available methods see the tslearn.barycenters
module.
tslearn provides three methods for calculating barycenters for a given set of time series:
Euclidean barycenter is simply the arithmetic mean for each individual point in time, minimizing the summed euclidean distance for each of them. As can be seen below, it is very different from the DTW-based methods and may often be inappropriate. However, it is the fastest of the methods shown.
DTW Barycenter Averaging (DBA) is an iteratively refined barycenter, starting out with a (potentially) bad candidate and improving it until convergence criteria are met. The optimization can be accomplished with (a) expectation-maximization [1] and (b) stochastic subgradient descent [2]. Empirically, the latter “is [often] more stable and finds better solutions in shorter time” [2].
Soft-DTW barycenter uses a differentiable loss function to iteratively find a barycenter [3]. The method itself and the parameter \(\gamma=1.0\) is described in more detail in the section on DTW. There is also a dedicated example available.
[1] F. Petitjean, A. Ketterlin & P. Gancarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, Elsevier, 2011, Vol. 44, Num. 3, pp. 678-693.
[2] D. Schultz & B. Jain. Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces. Pattern Recognition, 74, 340-358.
[3] M. Cuturi & M. Blondel. Soft-DTW: a Differentiable Loss Function for Time-Series. ICML 2017.
# Author: Romain Tavenard, Felix Divo
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.barycenters import \
euclidean_barycenter, \
dtw_barycenter_averaging, \
dtw_barycenter_averaging_subgradient, \
softdtw_barycenter
from tslearn.datasets import CachedDatasets
# fetch the example data set
numpy.random.seed(0)
X_train, y_train, _, _ = CachedDatasets().load_dataset("Trace")
X = X_train[y_train == 2]
length_of_sequence = X.shape[1]
def plot_helper(barycenter):
# plot all points of the data set
for series in X:
plt.plot(series.ravel(), "k-", alpha=.2)
# plot the given barycenter of them
plt.plot(barycenter.ravel(), "r-", linewidth=2)
# plot the four variants with the same number of iterations and a tolerance of
# 1e-3 where applicable
ax1 = plt.subplot(4, 1, 1)
plt.title("Euclidean barycenter")
plot_helper(euclidean_barycenter(X))
plt.subplot(4, 1, 2, sharex=ax1)
plt.title("DBA (vectorized version of Petitjean's EM)")
plot_helper(dtw_barycenter_averaging(X, max_iter=50, tol=1e-3))
plt.subplot(4, 1, 3, sharex=ax1)
plt.title("DBA (subgradient descent approach)")
plot_helper(dtw_barycenter_averaging_subgradient(X, max_iter=50, tol=1e-3))
plt.subplot(4, 1, 4, sharex=ax1)
plt.title("Soft-DTW barycenter ($\gamma$=1.0)")
plot_helper(softdtw_barycenter(X, gamma=1., max_iter=50, tol=1e-3))
# clip the axes for better readability
ax1.set_xlim([0, length_of_sequence])
# show the plot(s)
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 11.128 seconds)
Note
Go to the end to download the full example code
Soft-DTW weighted barycenters¶
This example presents the weighted Soft-DTW time series barycenter method.
Soft-DTW [1] is a differentiable loss function for Dynamic Time Warping, allowing for the use of gradient-based algorithms. The barycenter corresponds to the time series that minimizes the sum of the distances between that time series and all the time series from a dataset. It is thus an optimization problem and having a differentiable loss function makes find the solution much easier.
In this example, we consider four time series \(X_0, X_1, X_2\) and \(X_3\) from four different classes in the Trace dataset. We compute the barycenters for different sets of weights and plot them. The closer to a time series the barycenter is, the higher the weight for this time series is.
[1] M. Cuturi and M. Blondel, “Soft-DTW: a Differentiable Loss Function for Time-Series”. International Conference on Machine Learning, 2017.
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
import matplotlib.colors
from tslearn.preprocessing import TimeSeriesScalerMinMax
from tslearn.barycenters import softdtw_barycenter
from tslearn.datasets import CachedDatasets
def row_col(position, n_cols=5):
idx_row = (position - 1) // n_cols
idx_col = position - n_cols * idx_row - 1
return idx_row, idx_col
def get_color(weights):
baselines = numpy.zeros((4, 3))
weights = numpy.array(weights).reshape(1, 4)
for i, c in enumerate(["r", "g", "b", "y"]):
baselines[i] = matplotlib.colors.ColorConverter().to_rgb(c)
return numpy.dot(weights, baselines).ravel()
numpy.random.seed(0)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
X_out = numpy.empty((4, X_train.shape[1], X_train.shape[2]))
plt.figure()
for i in range(4):
X_out[i] = X_train[y_train == (i + 1)][0]
X_out = TimeSeriesScalerMinMax().fit_transform(X_out)
for i, pos in enumerate([1, 5, 21, 25]):
plt.subplot(5, 5, pos)
w = [0.] * 4
w[i] = 1.
plt.plot(X_out[i].ravel(),
color=matplotlib.colors.rgb2hex(get_color(w)),
linewidth=2)
plt.text(X_out[i].shape[0], 0., "$X_%d$" % i,
horizontalalignment="right",
verticalalignment="baseline",
fontsize=24)
plt.xticks([])
plt.yticks([])
for pos in range(2, 25):
if pos in [1, 5, 21, 25]:
continue
plt.subplot(5, 5, pos)
idxr, idxc = row_col(pos, 5)
w = numpy.array([0.] * 4)
w[0] = (4 - idxr) * (4 - idxc) / 16
w[1] = (4 - idxr) * idxc / 16
w[2] = idxr * (4 - idxc) / 16
w[3] = idxr * idxc / 16
plt.plot(softdtw_barycenter(X=X_out, weights=w).ravel(),
color=matplotlib.colors.rgb2hex(get_color(w)),
linewidth=2)
plt.xticks([])
plt.yticks([])
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 8.508 seconds)
Note
Go to the end to download the full example code
k-means¶
This example uses \(k\)-means clustering for time series. Three variants of the algorithm are available: standard Euclidean \(k\)-means, DBA-\(k\)-means (for DTW Barycenter Averaging [1]) and Soft-DTW \(k\)-means [2].
In the figure below, each row corresponds to the result of a different clustering. In a row, each sub-figure corresponds to a cluster. It represents the set of time series from the training set that were assigned to the considered cluster (in black) as well as the barycenter of the cluster (in red).
A note on pre-processing¶
In this example, time series are preprocessed using TimeSeriesScalerMeanVariance. This scaler is such that each output time series has zero mean and unit variance. The assumption here is that the range of a given time series is uninformative and one only wants to compare shapes in an amplitude-invariant manner (when time series are multivariate, this also rescales all modalities such that there will not be a single modality responsible for a large part of the variance). This means that one cannot scale barycenters back to data range because each time series is scaled independently and there is hence no such thing as an overall data range.
[1] F. Petitjean, A. Ketterlin & P. Gancarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, Elsevier, 2011, Vol. 44, Num. 3, pp. 678-693 [2] M. Cuturi, M. Blondel “Soft-DTW: a Differentiable Loss Function for Time-Series,” ICML 2017.
Euclidean k-means
15.795 --> 7.716 --> 7.716 -->
DBA k-means
Init 1
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.637 --> [Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.458 --> [Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.458 -->
Init 2
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.826 --> [Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.525 --> [Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.477 --> [Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.472 --> [Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
0.472 -->
[Parallel(n_jobs=1)]: Done 49 tasks | elapsed: 0.0s
Soft-DTW k-means
0.472 --> 0.144 --> 0.142 --> 0.143 --> 0.142 --> 0.143 --> 0.142 --> 0.143 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 --> 0.142 -->
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance, \
TimeSeriesResampler
seed = 0
numpy.random.seed(seed)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
X_train = X_train[y_train < 4] # Keep first 3 classes
numpy.random.shuffle(X_train)
# Keep only 50 time series
X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train[:50])
# Make time series shorter
X_train = TimeSeriesResampler(sz=40).fit_transform(X_train)
sz = X_train.shape[1]
# Euclidean k-means
print("Euclidean k-means")
km = TimeSeriesKMeans(n_clusters=3, verbose=True, random_state=seed)
y_pred = km.fit_predict(X_train)
plt.figure()
for yi in range(3):
plt.subplot(3, 3, yi + 1)
for xx in X_train[y_pred == yi]:
plt.plot(xx.ravel(), "k-", alpha=.2)
plt.plot(km.cluster_centers_[yi].ravel(), "r-")
plt.xlim(0, sz)
plt.ylim(-4, 4)
plt.text(0.55, 0.85,'Cluster %d' % (yi + 1),
transform=plt.gca().transAxes)
if yi == 1:
plt.title("Euclidean $k$-means")
# DBA-k-means
print("DBA k-means")
dba_km = TimeSeriesKMeans(n_clusters=3,
n_init=2,
metric="dtw",
verbose=True,
max_iter_barycenter=10,
random_state=seed)
y_pred = dba_km.fit_predict(X_train)
for yi in range(3):
plt.subplot(3, 3, 4 + yi)
for xx in X_train[y_pred == yi]:
plt.plot(xx.ravel(), "k-", alpha=.2)
plt.plot(dba_km.cluster_centers_[yi].ravel(), "r-")
plt.xlim(0, sz)
plt.ylim(-4, 4)
plt.text(0.55, 0.85,'Cluster %d' % (yi + 1),
transform=plt.gca().transAxes)
if yi == 1:
plt.title("DBA $k$-means")
# Soft-DTW-k-means
print("Soft-DTW k-means")
sdtw_km = TimeSeriesKMeans(n_clusters=3,
metric="softdtw",
metric_params={"gamma": .01},
verbose=True,
random_state=seed)
y_pred = sdtw_km.fit_predict(X_train)
for yi in range(3):
plt.subplot(3, 3, 7 + yi)
for xx in X_train[y_pred == yi]:
plt.plot(xx.ravel(), "k-", alpha=.2)
plt.plot(sdtw_km.cluster_centers_[yi].ravel(), "r-")
plt.xlim(0, sz)
plt.ylim(-4, 4)
plt.text(0.55, 0.85,'Cluster %d' % (yi + 1),
transform=plt.gca().transAxes)
if yi == 1:
plt.title("Soft-DTW $k$-means")
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 29.845 seconds)
Classification¶
Learning Shapelets: decision boundaries in 2D distance space
Note
Go to the end to download the full example code
SVM and GAK¶
This example illustrates the use of the global alignment kernel (GAK) for support vector classification.
This metric is defined in the tslearn.metrics module and explained in details in [1].
In this example, a TimeSeriesSVC model that uses GAK as kernel is fit and the support vectors for each class are reported.
[1] M. Cuturi, “Fast global alignment kernels,” ICML 2011.
Correct classification rate: 1.0
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMinMax
from tslearn.svm import TimeSeriesSVC
numpy.random.seed(0)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
X_train = TimeSeriesScalerMinMax().fit_transform(X_train)
X_test = TimeSeriesScalerMinMax().fit_transform(X_test)
clf = TimeSeriesSVC(kernel="gak", gamma=.1)
clf.fit(X_train, y_train)
print("Correct classification rate:", clf.score(X_test, y_test))
n_classes = len(set(y_train))
plt.figure()
support_vectors = clf.support_vectors_
for i, cl in enumerate(set(y_train)):
plt.subplot(n_classes, 1, i + 1)
plt.title("Support vectors for class %d" % cl)
for ts in support_vectors[i]:
plt.plot(ts.ravel())
plt.tight_layout()
plt.show()
Total running time of the script: (1 minutes 8.057 seconds)
Note
Go to the end to download the full example code
Learning Shapelets¶
This example illustrates how the “Learning Shapelets” method can quickly find a set of shapelets that results in excellent predictive performance when used for a shapelet transform.
More information on the method can be found at: http://fs.ismll.de/publicspace/LearningShapelets/.
Correct classification rate: 1.0
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
from sklearn.metrics import accuracy_score
import tensorflow as tf
import matplotlib.pyplot as plt
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMinMax
from tslearn.shapelets import LearningShapelets, \
grabocka_params_to_shapelet_size_dict
from tslearn.utils import ts_size
# Set seed for determinism
numpy.random.seed(0)
# Load the Trace dataset
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
# Normalize each of the timeseries in the Trace dataset
X_train = TimeSeriesScalerMinMax().fit_transform(X_train)
X_test = TimeSeriesScalerMinMax().fit_transform(X_test)
# Get statistics of the dataset
n_ts, ts_sz = X_train.shape[:2]
n_classes = len(set(y_train))
# Set the number of shapelets per size as done in the original paper
shapelet_sizes = grabocka_params_to_shapelet_size_dict(n_ts=n_ts,
ts_sz=ts_sz,
n_classes=n_classes,
l=0.1,
r=1)
# Define the model using parameters provided by the authors (except that we
# use fewer iterations here)
shp_clf = LearningShapelets(n_shapelets_per_size=shapelet_sizes,
optimizer=tf.optimizers.Adam(.01),
batch_size=16,
weight_regularizer=.01,
max_iter=200,
random_state=42,
verbose=0)
shp_clf.fit(X_train, y_train)
# Make predictions and calculate accuracy score
pred_labels = shp_clf.predict(X_test)
print("Correct classification rate:", accuracy_score(y_test, pred_labels))
# Plot the different discovered shapelets
plt.figure()
for i, sz in enumerate(shapelet_sizes.keys()):
plt.subplot(len(shapelet_sizes), 1, i + 1)
plt.title("%d shapelets of size %d" % (shapelet_sizes[sz], sz))
for shp in shp_clf.shapelets_:
if ts_size(shp) == sz:
plt.plot(shp.ravel())
plt.xlim([0, max(shapelet_sizes.keys()) - 1])
plt.tight_layout()
plt.show()
# The loss history is accessible via the `model_` that is a keras model
plt.figure()
plt.plot(numpy.arange(1, shp_clf.n_iter_ + 1), shp_clf.history_["loss"])
plt.title("Evolution of cross-entropy loss during training")
plt.xlabel("Epochs")
plt.show()
Total running time of the script: (1 minutes 11.990 seconds)
Note
Go to the end to download the full example code
Early Classification¶
This example presents the concept of early classification.
Early classifiers are implemented in the
tslearn.early_classification
module and in this example
we use the method from [1].
[1] A. Dachraoui, A. Bondu & A. Cornuejols. Early classification of time series as a non myopic sequential decision making problem. ECML/PKDD 2015
# Author: Romain Tavenard
# License: BSD 3 clause
# sphinx_gallery_thumbnail_number = 2
import numpy
import matplotlib.pyplot as plt
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.early_classification import NonMyopicEarlyClassifier
from tslearn.datasets import UCR_UEA_datasets
def plot_partial(time_series, t, y_true=0, y_pred=0, color="k"):
plt.plot(time_series[:t+1].ravel(), color=color, linewidth=1.5)
plt.plot(numpy.arange(t+1, time_series.shape[0]),
time_series[t+1:].ravel(),
linestyle="dashed", color=color, linewidth=1.5)
plt.axvline(x=t, color=color, linewidth=1.5)
plt.text(x=t - 20, y=time_series.max() - .25, s="Prediction time")
plt.title(
"Sample of class {} predicted as class {}".format(y_true, y_pred)
)
plt.xlim(0, time_series.shape[0] - 1)
Data loading and visualization¶
numpy.random.seed(0)
X_train, y_train, X_test, y_test = UCR_UEA_datasets().load_dataset("ECG200")
# Scale time series
X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train)
X_test = TimeSeriesScalerMeanVariance().fit_transform(X_test)
size = X_train.shape[1]
n_classes = len(set(y_train))
plt.figure()
for i, cl in enumerate(set(y_train)):
plt.subplot(n_classes, 1, i + 1)
for ts in X_train[y_train == cl]:
plt.plot(ts.ravel(), color="orange" if cl > 0 else "blue", alpha=.3)
plt.xlim(0, size - 1)
plt.suptitle("Training time series")
plt.show()
Model fitting¶
As observed in the following figure, the optimal classification time as estimated by NonMyopicEarlyClassifier is data-dependent.
early_clf = NonMyopicEarlyClassifier(n_clusters=3,
cost_time_parameter=1e-3,
lamb=1e2,
random_state=0)
early_clf.fit(X_train, y_train)
preds, times = early_clf.predict_class_and_earliness(X_test)
plt.figure()
plt.subplot(2, 1, 1)
ts_idx = 0
t = times[ts_idx]
plot_partial(X_test[ts_idx], t, y_test[ts_idx], preds[ts_idx], color="orange")
plt.subplot(2, 1, 2)
ts_idx = 9
t = times[ts_idx]
plot_partial(X_test[ts_idx], t, y_test[ts_idx], preds[ts_idx], color="blue")
plt.tight_layout()
plt.show()
Earliness-Accuracy trade-off¶
The trade-off between earliness and accuracy is controlled via
cost_time_parameter
.
plt.figure()
hatches = ["///", "\\\\\\", "*"]
for i, cost_t in enumerate([1e-4, 1e-3, 1e-2]):
early_clf.set_params(cost_time_parameter=cost_t)
early_clf.fit(X_train, y_train)
preds, times = early_clf.predict_class_and_earliness(X_test)
plt.hist(times,
alpha=.5, hatch=hatches[i],
density=True,
label="$\\alpha={}$".format(cost_t),
bins=numpy.arange(0, size, 5))
plt.legend(loc="upper right")
plt.xlim(0, size - 1)
plt.xlabel("Prediction times")
plt.title("Impact of cost_time_parameter ($\\alpha$)")
plt.show()
Total running time of the script: (1 minutes 15.858 seconds)
Note
Go to the end to download the full example code
Aligning discovered shapelets with timeseries¶
This example illustrates the use of the “Learning Shapelets” method in order to learn a collection of shapelets that linearly separates the timeseries. In this example, we will extract a single shapelet in order to distinguish between two classes of the “Trace” dataset. Afterwards, we show how our time series can be transformed to distances by aligning the shapelets along each of the time series. This alignment is performed by shifting the smaller shapelet across the longer time series and taking the minimal pointwise distance.
More information on the method can be found at: http://fs.ismll.de/publicspace/LearningShapelets/.
WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMinMax
from tslearn.shapelets import LearningShapelets, \
grabocka_params_to_shapelet_size_dict
from tensorflow.keras.optimizers import Adam
# Set a seed to ensure determinism
numpy.random.seed(42)
# Load the Trace dataset
X_train, y_train, _, _ = CachedDatasets().load_dataset("Trace")
# Filter out classes 2 and 4
mask = numpy.isin(y_train, [1, 3])
X_train = X_train[mask]
y_train = y_train[mask]
# Normalize the time series
X_train = TimeSeriesScalerMinMax().fit_transform(X_train)
# Get statistics of the dataset
n_ts, ts_sz = X_train.shape[:2]
n_classes = len(set(y_train))
# We will extract 1 shapelet and align it with a time series
shapelet_sizes = {20: 1}
# Define the model and fit it using the training data
shp_clf = LearningShapelets(n_shapelets_per_size=shapelet_sizes,
weight_regularizer=0.001,
optimizer=Adam(lr=0.01),
max_iter=250,
verbose=0,
scale=False,
random_state=42)
shp_clf.fit(X_train, y_train)
# Get the number of extracted shapelets, the (minimal) distances from
# each of the timeseries to each of the shapelets, and the corresponding
# locations (index) where the minimal distance was found
n_shapelets = sum(shapelet_sizes.values())
distances = shp_clf.transform(X_train)
predicted_locations = shp_clf.locate(X_train)
f, ax = plt.subplots(2, 1, sharex=True)
# Plot the shapelet and align it on the best matched time series. The optimizer
# will often enlarge the shapelet to create a larger gap between the distances
# of both classes. We therefore normalize the shapelet again before plotting.
test_ts_id = numpy.argmin(numpy.sum(distances, axis=1))
shap = shp_clf.shapelets_[0]
shap = TimeSeriesScalerMinMax().fit_transform(shap.reshape(1, -1, 1)).flatten()
pos = predicted_locations[test_ts_id, 0]
ax[0].plot(X_train[test_ts_id].ravel())
ax[0].plot(numpy.arange(pos, pos + len(shap)), shap, linewidth=2)
ax[0].axvline(pos, color='k', linestyle='--', alpha=0.25)
ax[0].set_title("The aligned extracted shapelet")
# We calculate the distances from the shapelet to the timeseries ourselves.
distances = []
time_series = X_train[test_ts_id].ravel()
for i in range(len(time_series) - len(shap)):
distances.append(numpy.linalg.norm(time_series[i:i+len(shap)] - shap))
ax[1].plot(distances)
ax[1].axvline(numpy.argmin(distances), color='k', linestyle='--', alpha=0.25)
ax[1].set_title('The distances between the time series and the shapelet')
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 51.078 seconds)
Note
Go to the end to download the full example code
Learning Shapelets: decision boundaries in 2D distance space¶
This example illustrates the use of the “Learning Shapelets” method in order to learn a collection of shapelets that linearly separates the timeseries. In this example, we will extract two shapelets which are then used to transform our input time series in a two-dimensional space, which is called the shapelet-transform space in the related literature. Moreover, we plot the decision boundaries of our classifier for each of the different classes.
More information on the method can be found at: http://fs.ismll.de/publicspace/LearningShapelets/.
# Author: Gilles Vandewiele
# License: BSD 3 clause
import numpy
from matplotlib import cm
import matplotlib.pyplot as plt
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMinMax
from tslearn.shapelets import LearningShapelets
from tensorflow.keras.optimizers import Adam
# Set a seed to ensure determinism
numpy.random.seed(42)
# Load the Trace dataset
X_train, y_train, _, _ = CachedDatasets().load_dataset("Trace")
# Normalize the time series
X_train = TimeSeriesScalerMinMax().fit_transform(X_train)
# Get statistics of the dataset
n_ts, ts_sz = X_train.shape[:2]
n_classes = len(set(y_train))
# We will extract 2 shapelets and align them with the time series
shapelet_sizes = {20: 2}
# Define the model and fit it using the training data
shp_clf = LearningShapelets(n_shapelets_per_size=shapelet_sizes,
weight_regularizer=0.0001,
optimizer=Adam(lr=0.01),
max_iter=300,
verbose=0,
scale=False,
random_state=42)
shp_clf.fit(X_train, y_train)
# We will plot our distances in a 2D space
distances = shp_clf.transform(X_train).reshape((-1, 2))
weights, biases = shp_clf.get_weights('classification')
# Create a grid for our two shapelets on the left and distances on the right
viridis = cm.get_cmap('viridis', 4)
fig = plt.figure(constrained_layout=True)
gs = fig.add_gridspec(3, 9)
fig_ax1 = fig.add_subplot(gs[0, :2])
fig_ax2 = fig.add_subplot(gs[0, 2:4])
fig_ax3a = fig.add_subplot(gs[1, :2])
fig_ax3b = fig.add_subplot(gs[1, 2:4])
fig_ax3c = fig.add_subplot(gs[2, :2])
fig_ax3d = fig.add_subplot(gs[2, 2:4])
fig_ax4 = fig.add_subplot(gs[:, 4:])
# Plot our two shapelets on the left side
fig_ax1.plot(shp_clf.shapelets_[0])
fig_ax1.set_title('Shapelet $\mathbf{s}_1$')
fig_ax2.plot(shp_clf.shapelets_[1])
fig_ax2.set_title('Shapelet $\mathbf{s}_2$')
# Create the time series of each class
for i, subfig in enumerate([fig_ax3a, fig_ax3b, fig_ax3c, fig_ax3d]):
for k, ts in enumerate(X_train[y_train == i + 1]):
subfig.plot(ts.flatten(), c=viridis(i / 3), alpha=0.25)
subfig.set_title('Class {}'.format(i + 1))
fig.text(x=.15, y=.02, s='Input time series', fontsize=12)
# Create a scatter plot of the 2D distances for the time series of each class.
for i, y in enumerate(numpy.unique(y_train)):
fig_ax4.scatter(distances[y_train == y][:, 0],
distances[y_train == y][:, 1],
c=[viridis(i / 3)] * numpy.sum(y_train == y),
edgecolors='k',
label='Class {}'.format(y))
# Create a meshgrid of the decision boundaries
xmin = numpy.min(distances[:, 0]) - 0.1
xmax = numpy.max(distances[:, 0]) + 0.1
ymin = numpy.min(distances[:, 1]) - 0.1
ymax = numpy.max(distances[:, 1]) + 0.1
xx, yy = numpy.meshgrid(numpy.arange(xmin, xmax, (xmax - xmin)/200),
numpy.arange(ymin, ymax, (ymax - ymin)/200))
Z = []
for x, y in numpy.c_[xx.ravel(), yy.ravel()]:
Z.append(numpy.argmax([biases[i] + weights[0][i]*x + weights[1][i]*y
for i in range(4)]))
Z = numpy.array(Z).reshape(xx.shape)
cs = fig_ax4.contourf(xx, yy, Z / 3, cmap=viridis, alpha=0.25)
fig_ax4.legend()
fig_ax4.set_xlabel('$d(\mathbf{x}, \mathbf{s}_1)$')
fig_ax4.set_ylabel('$d(\mathbf{x}, \mathbf{s}_2)$')
fig_ax4.set_xlim((xmin, xmax))
fig_ax4.set_ylim((ymin, ymax))
fig_ax4.set_title('Distance transformed time series')
plt.show()
Total running time of the script: (1 minutes 4.805 seconds)
Automatic differentiation¶
Note
Go to the end to download the full example code
Soft-DTW loss for PyTorch neural network¶
The aim here is to use the Soft Dynamic Time Warping metric as a loss function of a PyTorch Neural Network for time series forecasting.
The torch-compatible implementation of the soft-DTW loss function is available from the
tslearn.metrics
module.
# Authors: Yann Cabanes, Romain Tavenard
# License: BSD 3 clause
# sphinx_gallery_thumbnail_number = 2
"""Import the modules"""
import numpy as np
import matplotlib.pyplot as plt
from tslearn.datasets import CachedDatasets
from tslearn.metrics import SoftDTWLossPyTorch
import torch
from torch import nn
Load the dataset¶
Using the CachedDatasets utility from tslearn, we load the “Trace” time series dataset. The dimensions of the arrays storing the time series training and testing datasets are (100, 275, 1). We create a new dataset X_subset made of 50 random time series from classes indexed 1 to 3 (y_train < 4) in the training set: X_subset is of shape (50, 275, 1).
data_loader = CachedDatasets()
X_train, y_train, X_test, y_test = data_loader.load_dataset("Trace")
X_subset = X_train[y_train < 4]
np.random.shuffle(X_subset)
X_subset = X_subset[:50]
Multi-step ahead forecasting¶
In this section, our goal is to implement a single-hidden-layer perceptron for time series forecasting. Our network will be trained to minimize the soft-DTW metric. We will rely on a torch-compatible implementation of the soft-DTW loss function. The code below is an implementation of a generic Multi-Layer-Perceptron class in torch, and we will rely on it for the implementation of a forecasting MLP with softDTW loss.
# Note that Soft-DTW can take negative values due to the regularization parameter gamma.
# The normalized soft-DTW (also coined soft-DTW divergence) between the time series x and y is defined as:
# Soft-DTW(x, y) - (Soft-DTW(x, x) + Soft-DTW(y, y)) / 2
# The normalized Soft-DTW is always positive.
# However, the computation time of the normalized soft-DTW equals three times the computation time of the Soft-DTW.
class MultiLayerPerceptron(torch.nn.Module):
def __init__(self, layers, loss=None):
# At init, we define our layers
super(MultiLayerPerceptron, self).__init__()
self.layers = layers
if loss is None:
self.loss = torch.nn.MSELoss(reduction="none")
else:
self.loss = loss
self.optimizer = torch.optim.SGD(self.parameters(), lr=0.001)
def forward(self, X):
# The forward method informs about the forward pass: how one computes outputs of the network
# from the input and the parameters of the layers registered at init
if not isinstance(X, torch.Tensor):
X = torch.Tensor(X)
batch_size = X.size(0)
X_reshaped = torch.reshape(X, (batch_size, -1)) # Manipulations to deal with time series format
output = self.layers(X_reshaped)
return torch.reshape(output, (batch_size, -1, 1)) # Manipulations to deal with time series format
def fit(self, X, y, max_epochs=10):
# The fit method performs the actual optimization
X_torch = torch.Tensor(X)
y_torch = torch.Tensor(y)
for e in range(max_epochs):
self.optimizer.zero_grad()
# Forward pass
y_pred = self.forward(X_torch)
# Compute Loss
loss = self.loss(y_pred, y_torch).mean()
# Backward pass
loss.backward()
self.optimizer.step()
Using MSE as a loss function¶
We define an MLP class that would allow training a single-hidden-layer model using mean squared error (MSE) as a loss function to be optimized. We train the network for 1000 epochs on a forecasting task that would consist, given the first 150 elements of a time series, in predicting the next 125 ones.
model = MultiLayerPerceptron(
layers=nn.Sequential(
nn.Linear(in_features=150, out_features=256),
nn.ReLU(),
nn.Linear(in_features=256, out_features=125)
)
)
# Here one needs to define what X and y are, obviously
model.fit(X_subset[:, :150], X_subset[:, 150:], max_epochs=1000)
ts_index = 50
y_pred = model(X_test[:, :150, 0]).detach().numpy()
plt.figure()
plt.title('Multi-step ahead forecasting using MSE')
plt.plot(X_test[ts_index].ravel())
plt.plot(np.arange(150, 275), y_pred[ts_index], 'r-')
[<matplotlib.lines.Line2D object at 0x7ff23b35c850>]
Using Soft-DTW as a loss function¶
We take inspiration from the code above to define an MLP class that would allow training a single-hidden-layer model using soft-DTW as a criterion to be optimized. We train the network for 100 epochs on a forecasting task that would consist, given the first 150 elements of a time series, in predicting the next 125 ones.
model = MultiLayerPerceptron(
layers=nn.Sequential(
nn.Linear(in_features=150, out_features=256),
nn.ReLU(),
nn.Linear(in_features=256, out_features=125)
),
loss=SoftDTWLossPyTorch(gamma=0.1)
)
model.fit(X_subset[:, :150], X_subset[:, 150:], max_epochs=100)
y_pred = model(X_test[:, :150, 0]).detach().numpy()
plt.figure()
plt.title('Multi-step ahead forecasting using Soft-DTW loss')
plt.plot(X_test[ts_index].ravel())
plt.plot(np.arange(150, 275), y_pred[ts_index], 'r-')
[<matplotlib.lines.Line2D object at 0x7ff22b6d73a0>]
Total running time of the script: (0 minutes 10.831 seconds)
Miscellaneous¶
Note
Go to the end to download the full example code
Model Persistence¶
Many tslearn models can be saved to disk and used for predictions at a later time. This can be particularly useful when a model takes a long time to train.
Available formats: hdf5, json, pickle
Save a model to disk:
model.to_<format>
Load a model from disk:
model.from_<format>
Basic usage
# Instantiate a model
model = ModelClass(<hyper-parameters>)
# Train the model
model.fit(X_train)
# Save the model to disk
model.to_hdf5('./trained_model.hdf5')
# Load model from disk
model.from_hdf5('./trained_mode.hdf5')
# Make predictions
y = model.predict(X_test)
Note
For the following models the training data are saved to disk and
may result in a large model file if the trainig dataset is large:
KNeighborsTimeSeries
, KNeighborsTimeSeriesClassifier
, and
KernelKMeans
0.009 --> 0.009 --> 0.008 --> 0.008 --> 0.008 --> 0.007 --> 0.007 --> 0.006 --> 0.005 --> 0.005 --> 0.005 --> 0.005 --> 0.004 --> 0.004 --> 0.004 --> 0.004 --> 0.004 --> 0.004 --> 0.003 --> 0.003 --> 0.003 --> 0.003 --> 0.003 --> 0.003 --> 0.003 --> 0.003 --> 0.003 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 --> 0.002 -->
# Example using KShape
import numpy
import matplotlib.pyplot as plt
from tslearn.clustering import KShape
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
seed = 0
numpy.random.seed(seed)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
# Keep first 3 classes
X_train = X_train[y_train < 4]
numpy.random.shuffle(X_train)
# Keep only 50 time series
X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train[:50])
sz = X_train.shape[1]
# Instantiate k-Shape model
ks = KShape(n_clusters=3, verbose=True, random_state=seed)
# Train
ks.fit(X_train)
# Save model
ks.to_hdf5('./ks_trained.hdf5')
# Load model
trained_ks = KShape.from_hdf5('./ks_trained.hdf5')
# Use loaded model to make predictions
y_pred = trained_ks.predict(X_train)
plt.figure()
for yi in range(3):
plt.subplot(3, 1, 1 + yi)
for xx in X_train[y_pred == yi]:
plt.plot(xx.ravel(), "k-", alpha=.2)
plt.plot(ks.cluster_centers_[yi].ravel(), "r-")
plt.xlim(0, sz)
plt.ylim(-4, 4)
plt.title("Cluster %d" % (yi + 1))
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 10.789 seconds)
Note
Go to the end to download the full example code
PAA and SAX features¶
This example presents a comparison between PAA [1], SAX [2] and 1d-SAX [3] features.
PAA (Piecewise Aggregate Approximation) corresponds to a downsampling of the original time series and, in each segment (segments have fixed size), the mean value is retained.
SAX (Symbolic Aggregate approXimation) builds upon PAA by quantizing the mean value. Quantization boundaries are computed for all symbols to be equiprobable, under a standard normal distribution assumption.
Finally, 1d-SAX is an extension of SAX in which each segment is represented by an affine function (2 parameters per segment are hence quantized: slope and mean value).
[1] E. Keogh & M. Pazzani. Scaling up dynamic time warping for datamining applications. SIGKDD 2000, pp. 285–289.
[2] J. Lin, E. Keogh, L. Wei, et al. Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 2007. vol. 15(107)
[3] S. Malinowski, T. Guyet, R. Quiniou, R. Tavenard. 1d-SAX: a Novel Symbolic Representation for Time Series. IDA 2013.
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.generators import random_walks
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.piecewise import PiecewiseAggregateApproximation
from tslearn.piecewise import SymbolicAggregateApproximation, \
OneD_SymbolicAggregateApproximation
numpy.random.seed(0)
# Generate a random walk time series
n_ts, sz, d = 1, 100, 1
dataset = random_walks(n_ts=n_ts, sz=sz, d=d)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.) # Rescale time series
dataset = scaler.fit_transform(dataset)
# PAA transform (and inverse transform) of the data
n_paa_segments = 10
paa = PiecewiseAggregateApproximation(n_segments=n_paa_segments)
paa_dataset_inv = paa.inverse_transform(paa.fit_transform(dataset))
# SAX transform
n_sax_symbols = 8
sax = SymbolicAggregateApproximation(n_segments=n_paa_segments,
alphabet_size_avg=n_sax_symbols)
sax_dataset_inv = sax.inverse_transform(sax.fit_transform(dataset))
# 1d-SAX transform
n_sax_symbols_avg = 8
n_sax_symbols_slope = 8
one_d_sax = OneD_SymbolicAggregateApproximation(
n_segments=n_paa_segments,
alphabet_size_avg=n_sax_symbols_avg,
alphabet_size_slope=n_sax_symbols_slope)
transformed_data = one_d_sax.fit_transform(dataset)
one_d_sax_dataset_inv = one_d_sax.inverse_transform(transformed_data)
plt.figure()
plt.subplot(2, 2, 1) # First, raw time series
plt.plot(dataset[0].ravel(), "b-")
plt.title("Raw time series")
plt.subplot(2, 2, 2) # Second, PAA
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(paa_dataset_inv[0].ravel(), "b-")
plt.title("PAA")
plt.subplot(2, 2, 3) # Then SAX
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(sax_dataset_inv[0].ravel(), "b-")
plt.title("SAX, %d symbols" % n_sax_symbols)
plt.subplot(2, 2, 4) # Finally, 1d-SAX
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(one_d_sax_dataset_inv[0].ravel(), "b-")
plt.title("1d-SAX, %d symbols"
"(%dx%d)" % (n_sax_symbols_avg * n_sax_symbols_slope,
n_sax_symbols_avg,
n_sax_symbols_slope))
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 6.257 seconds)
Note
Go to the end to download the full example code
Matrix Profile¶
This example presents a toy example of using Matrix Profile [1] for anomaly detection.
Matrix Profile transforms a time series into a sequence of 1-Nearest-Neighbor distances between its subseries.
[1] C. M. Yeh, Y. Zhu, L. Ulanova, N.Begum et al. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. ICDM 2016.
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
import matplotlib.transforms as mtransforms
from tslearn.matrix_profile import MatrixProfile
s_x = numpy.array(
[-0.790, -0.765, -0.734, -0.700, -0.668, -0.639, -0.612, -0.587, -0.564,
-0.544, -0.529, -0.518, -0.509, -0.502, -0.494, -0.488, -0.482, -0.475,
-0.472, -0.470, -0.465, -0.464, -0.461, -0.458, -0.459, -0.460, -0.459,
-0.458, -0.448, -0.431, -0.408, -0.375, -0.333, -0.277, -0.196, -0.090,
0.047, 0.220, 0.426, 0.671, 0.962, 1.300, 1.683, 2.096, 2.510, 2.895,
3.219, 3.463, 3.621, 3.700, 3.713, 3.677, 3.606, 3.510, 3.400, 3.280,
3.158, 3.038, 2.919, 2.801, 2.676, 2.538, 2.382, 2.206, 2.016, 1.821,
1.627, 1.439, 1.260, 1.085, 0.917, 0.758, 0.608, 0.476, 0.361, 0.259,
0.173, 0.096, 0.027, -0.032, -0.087, -0.137, -0.179, -0.221, -0.260,
-0.293, -0.328, -0.359, -0.385, -0.413, -0.437, -0.458, -0.480, -0.498,
-0.512, -0.526, -0.536, -0.544, -0.552, -0.556, -0.561, -0.565, -0.568,
-0.570, -0.570, -0.566, -0.560, -0.549, -0.532, -0.510, -0.480, -0.443,
-0.402, -0.357, -0.308, -0.256, -0.200, -0.139, -0.073, -0.003, 0.066,
0.131, 0.186, 0.229, 0.259, 0.276, 0.280, 0.272, 0.256, 0.234, 0.209,
0.186, 0.162, 0.139, 0.112, 0.081, 0.046, 0.008, -0.032, -0.071, -0.110,
-0.147, -0.180, -0.210, -0.235, -0.256, -0.275, -0.292, -0.307, -0.320,
-0.332, -0.344, -0.355, -0.363, -0.367, -0.364, -0.351, -0.330, -0.299,
-0.260, -0.217, -0.172, -0.128, -0.091, -0.060, -0.036, -0.022, -0.016,
-0.020, -0.037, -0.065, -0.104, -0.151, -0.201, -0.253, -0.302, -0.347,
-0.388, -0.426, -0.460, -0.491, -0.517, -0.539, -0.558, -0.575, -0.588,
-0.600, -0.606, -0.607, -0.604, -0.598, -0.589, -0.577, -0.558, -0.531,
-0.496, -0.454, -0.410, -0.364, -0.318, -0.276, -0.237, -0.203, -0.176,
-0.157, -0.145, -0.142, -0.145, -0.154, -0.168, -0.185, -0.206, -0.230,
-0.256, -0.286, -0.318, -0.351, -0.383, -0.414, -0.442, -0.467, -0.489,
-0.508, -0.523, -0.535, -0.544, -0.552, -0.557, -0.560, -0.560, -0.557,
-0.551, -0.542, -0.531, -0.519, -0.507, -0.494, -0.484, -0.476, -0.469,
-0.463, -0.456, -0.449, -0.442, -0.435, -0.431, -0.429, -0.430, -0.435,
-0.442, -0.452, -0.465, -0.479, -0.493, -0.506, -0.517, -0.526, -0.535,
-0.548, -0.567, -0.592, -0.622, -0.655, -0.690, -0.728, -0.764, -0.795,
-0.815, -0.823, -0.821]).reshape((-1, 1))
mp = MatrixProfile(subsequence_length=20, scale=False)
mp_series = mp.fit_transform([s_x])[0]
t_star = numpy.argmax(mp_series.ravel())
plt.figure()
ax = plt.subplot(2, 1, 1) # First, raw time series
trans = mtransforms.blended_transform_factory(ax.transData, ax.transAxes)
plt.plot(s_x.ravel(), "b-")
plt.xlim([0, s_x.shape[0]])
plt.axvline(x=t_star, c="red", linewidth=2)
plt.fill_between(x=[t_star, t_star+mp.subsequence_length], y1=0., y2=1.,
facecolor="r", alpha=.2, transform=trans)
plt.title("Raw time series")
plt.subplot(2, 1, 2) # Second, Matrix Profile
plt.plot(mp_series.ravel(), "b-")
plt.axvline(x=t_star, c="red", linewidth=2, linestyle="dashed")
plt.xlim([0, s_x.shape[0]])
plt.title("Matrix Profile")
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 0.453 seconds)
Note
Go to the end to download the full example code
Distance and Matrix Profiles¶
This example illustrates how the matrix profile is calculated. For each segment of a timeseries with a specified length, the distances between each subsequence and that segment are calculated. The smallest distance is returned, except for trivial match on the location where the segment is extracted from which is equal to zero.
# Author: Gilles Vandewiele
# License: BSD 3 clause
import numpy
import matplotlib.patches as patches
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import matplotlib.pyplot as plt
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.matrix_profile import MatrixProfile
import warnings
warnings.filterwarnings('ignore')
# Set a seed to ensure determinism
numpy.random.seed(42)
# Load the Trace dataset
X_train, y_train, _, _ = CachedDatasets().load_dataset("Trace")
# Normalize the time series
scaler = TimeSeriesScalerMeanVariance()
X_train = scaler.fit_transform(X_train)
# Take the first time series
ts = X_train[0, :, :]
# We will take the spike as a segment
subseq_len = 20
start = 45
segment = ts[start:start + subseq_len]
# Create our matrix profile
matrix_profiler = MatrixProfile(subsequence_length=subseq_len, scale=True)
mp = matrix_profiler.fit_transform([ts]).flatten()
# Create a grid for our plots
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, sharex=True)
# Plot our timeseries
ax1.plot(ts, c='b', label='time series')
ax1.add_patch(patches.Rectangle((start, numpy.min(ts) - 0.1), subseq_len,
numpy.max(ts) - numpy.min(ts) + 0.2,
facecolor='b', alpha=0.25,
label='segment'))
ax1.axvline(start, c='b', linestyle='--', lw=2, alpha=0.5,
label='segment start')
ax1.legend(loc='lower right', ncol=4, fontsize=8,
handletextpad=0.1, columnspacing=0.5)
ax1.set_title('The time series')
# Inset plot with our segment
fig_ax_in = ax1.inset_axes([0.5, 0.55, 0.2, 0.4])
fig_ax_in.plot(scaler.fit_transform(segment.reshape(1, -1, 1))[0], c='b')
ax1.indicate_inset(inset_ax=fig_ax_in, transform=ax1.transData,
bounds=[start, numpy.min(ts) - 0.1, subseq_len,
numpy.max(ts) - numpy.min(ts) + 0.2],
linestyle='--', alpha=0.75)
fig_ax_in.tick_params(labelleft=False, labelbottom=False)
fig_ax_in.xaxis.set_visible(False)
fig_ax_in.yaxis.set_visible(False)
# Calculate a distance profile, which represents the distance from each
# subsequence of the time series and the segment
distances = []
for i in range(len(ts) - subseq_len):
scaled_ts = scaler.fit_transform(ts[i:i+subseq_len].reshape(1, -1, 1))
scaled_segment = scaler.fit_transform(segment.reshape(1, -1, 1))
distances.append(numpy.linalg.norm(scaled_ts - scaled_segment))
# Mask out the distances in the trivial match zone, get the nearest
# neighbor and put the old distances back in place so we can plot them.
distances = numpy.array(distances)
mask = list(range(start - subseq_len // 4, start + subseq_len // 4))
old_distances = distances[mask]
distances[mask] = numpy.inf
nearest_neighbor = numpy.argmin(distances)
dist_nn = distances[nearest_neighbor]
distances[mask] = old_distances
# Plot our distance profile
ax2.plot(distances, c='b')
ax2.set_title('Segment distance profile')
dist_diff = numpy.max(distances) - numpy.min(distances)
ax2.add_patch(patches.Rectangle((start - subseq_len // 4,
numpy.min(distances) - 0.1),
subseq_len // 2,
dist_diff + 0.2,
facecolor='r', alpha=0.5,
label='exclusion zone'))
ax2.scatter(nearest_neighbor, dist_nn, c='r', marker='x', s=50,
label='neighbor dist = {}'.format(numpy.around(dist_nn, 3)))
ax2.axvline(start, c='b', linestyle='--', lw=2, alpha=0.5,
label='segment start')
ax2.legend(loc='lower right', fontsize=8, ncol=3,
handletextpad=0.1, columnspacing=0.5)
# Plot our matrix profile
ax3.plot(mp, c='b')
ax3.set_title('Matrix profile')
ax3.scatter(start, mp[start],
c='r', marker='x', s=75,
label='MP segment = {}'.format(numpy.around(mp[start], 3)))
ax3.axvline(start, c='b', linestyle='--', lw=2, alpha=0.5,
label='segment start')
ax3.legend(loc='lower right', fontsize=8,
handletextpad=0.1, columnspacing=0.25)
plt.tight_layout()
plt.show()
Total running time of the script: (0 minutes 2.839 seconds)
Citing tslearn¶
If you use tslearn in a scientific publication, we would appreciate citations:
Bibtex entry:
@article{JMLR:v21:20-091, author = {Romain Tavenard and Johann Faouzi and Gilles Vandewiele and Felix Divo and Guillaume Androz and Chester Holtz and Marie Payne and Roman Yurchak and Marc Ru{\ss}wurm and Kushal Kolar and Eli Woods}, title = {Tslearn, A Machine Learning Toolkit for Time Series Data}, journal = {Journal of Machine Learning Research}, year = {2020}, volume = {21}, number = {118}, pages = {1-6}, url = {http://jmlr.org/papers/v21/20-091.html} }