GPy - A Gaussian Process (GP) framework in Python¶
Introduction¶
GPy is a Gaussian Process (GP) framework written in Python, from the Sheffield machine learning group. It includes support for basic GP regression, multiple output GPs (using coregionalization), various noise models, sparse GPs, non-parametric regression and latent variables.
The GPy homepage contains tutorials for users and further information on the project, including installation instructions.
The documentation hosted here is mostly aimed at developers interacting closely with the code-base.
Source Code¶
The code can be found on our Github project page. It is open source and provided under the BSD license.
Installation¶
Installation instructions can currently be found on our Github project page.
Tutorials¶
Several tutorials have been developed in the form of Jupyter Notebooks.
Architecture¶
GPy is a big, powerful package, with many features. The concept of how to use GPy in general terms is roughly as follows. A model (GPy.models
) is created - this is at the heart of GPy from a user perspective. A kernel (GPy.kern
), data and, usually, a representation of noise are assigned to the model. Specific models require, or can make use of, additional information. The kernel and noise are controlled by hyperparameters - calling the optimize (GPy.core.gp.GP.optimize
) method against the model invokes an iterative process which seeks optimal hyperparameter values. The model object can be used to make plots and predictions (GPy.core.gp.GP.predict
).
![digraph GPy_Arch {
rankdir=LR
node[shape="rectangle" style="rounded,filled" fontname="Arial"]
edge [color="#006699" len=2.5]
Data->Model
Hyperparameters->Kernel
Hyperparameters->Noise
Kernel->Model
Noise->Model
Model->Optimize
Optimize->Hyperparameters
Model->Predict
Model->Plot
Optimize [shape="ellipse"]
Predict [shape="ellipse"]
Plot [shape="ellipse"]
subgraph cluster_0 {
Data
Kernel
Noise
}
}](_images/graphviz-28920a41505d67c3062e62cfb26873de6757a0b0.png)
Creating new Models¶
In GPy all models inherit from the base class Parameterized
. Parameterized
is a class which allows for parameterization of objects. All it holds is functionality for tying, bounding and fixing of parameters. It also provides the functionality of searching and manipulating parameters by regular expression syntax. See Parameterized
for more information.
The Model
class provides parameter introspection, objective function and optimization.
In order to fully use all functionality of
Model
some methods need to be implemented
/ overridden. And the model needs to be told its parameters, such
that it can provide optimized parameter distribution and handling.
In order to explain the functionality of those methods
we will use a wrapper to the numpy rosen
function, which holds
input parameters \(\mathbf{X}\). Where
\(\mathbf{X}\in\mathbb{R}^{N\times 1}\).
Obligatory methods¶
__init__
:Initialize the model with the given parameters. These need to be added to the model by calling self.add_parameter(<param>), where param needs to be a parameter handle (See parameterized_ for details).:
self.X = GPy.Param("input", X) self.add_parameter(self.X)
log_likelihood
:Returns the log-likelihood of the new model. For our example this is just the call to
rosen
and as we want to minimize it, we need to negate the objective.:return -scipy.optimize.rosen(self.X)
parameters_changed
:Updates the internal state of the model and sets the gradient of each parameter handle in the hierarchy with respect to the log_likelihod. Thus here we need to set the negative derivative of the rosenbrock function for the parameters. In this case it is the gradient for self.X.:
self.X.gradient = -scipy.optimize.rosen_der(self.X)
Here the full code for the Rosen class:
from GPy import Model, Param
import scipy
class Rosen(Model):
def __init__(self, X, name='rosenbrock'):
super(Rosen, self).__init__(name=name)
self.X = Param("input", X)
self.add_parameter(self.X)
def log_likelihood(self):
return -scipy.optimize.rosen(self.X)
def parameters_changed(self):
self.X.gradient = -scipy.optimize.rosen_der(self.X)
In order to test the newly created model, we can check the gradients and optimize a standard rosenbrock run:
>>> m = Rosen(np.array([-1,-1]))
>>> print m
Name : rosenbrock
Log-likelihood : -404.0
Number of Parameters : 2
Parameters:
rosenbrock. | Value | Constraint | Prior | Tied to
input | (2,) | | |
>>> m.checkgrad(verbose=True)
Name | Ratio | Difference | Analytical | Numerical
------------------------------------------------------------------------------------------
rosenbrock.input[[0]] | 1.000000 | 0.000000 | -804.000000 | -804.000000
rosenbrock.input[[1]] | 1.000000 | 0.000000 | -400.000000 | -400.000000
>>> m.optimize()
>>> print m
Name : rosenbrock
Log-likelihood : -6.52150088871e-15
Number of Parameters : 2
Parameters:
rosenbrock. | Value | Constraint | Prior | Tied to
input | (2,) | | |
>>> print m.input
Index | rosenbrock.input | Constraint | Prior | Tied to
[0] | 0.99999994 | | | N/A
[1] | 0.99999987 | | | N/A
>>> print m.gradient
[ -1.91169809e-06, 1.01852309e-06]
This is the optimium for the 2D Rosenbrock function, as expected, and the gradient of the inputs are almost zero.
Optional methods¶
Currently none.
Creating new kernels¶
We will see in this tutorial how to create new kernels in GPy. We will also give details on how to implement each function of the kernel and illustrate with a running example: the rational quadratic kernel.
Structure of a kernel in GPy¶
In GPy a kernel object is made of a list of kernpart objects, which correspond to symmetric positive definite functions. More precisely, the kernel should be understood as the sum of the kernparts. In order to implement a new covariance, the following steps must be followed
- implement the new covariance as a
GPy.kern.src.kern.Kern
object- update the
GPy.kern.src
file
Theses three steps are detailed below.
Implementing a Kern object¶
We advise the reader to start with copy-pasting an existing kernel and to modify the new file. We will now give a description of the various functions that can be found in a Kern object, some of which are mandatory for the new kernel to work.
Header¶
The header is similar to all kernels:
from .kern import Kern
import numpy as np
class RationalQuadratic(Kern):
GPy.kern.src.kern.Kern.__init__
(self, input_dim, param1, param2, *args)
¶
The implementation of this function in mandatory.
For all Kerns the first parameter input_dim
corresponds to the
dimension of the input space, and the following parameters stand for
the parameterization of the kernel.
You have to call super(<class_name>, self).__init__(input_dim, active_dims,
name)
to make sure the input dimension (and possible dimension restrictions using active_dims) and name of the kernel are
stored in the right place. These attributes are available as
self.input_dim
and self.name
at runtime. Parameterization is
done by adding Param
objects to self
and use them as normal numpy array-like
s in
your code. The parameters have to be added by calling
link_parameters
(*parameters)
with the
Param
objects as
arguments:
from .core.parameterization import Param
def __init__(self,input_dim,variance=1.,lengthscale=1.,power=1.,active_dims=None):
super(RationalQuadratic, self).__init__(input_dim, active_dims, 'rat_quad')
assert input_dim == 1, "For this kernel we assume input_dim=1"
self.variance = Param('variance', variance)
self.lengthscale = Param('lengtscale', lengthscale)
self.power = Param('power', power)
self.link_parameters(self.variance, self.lengthscale, self.power)
From now on you can use the parameters self.variance,
self.lengthscale, self.power
as normal numpy array-like
s in your
code. Updates from the optimization routine will be done
automatically.
parameters_changed
(self)
¶
The implementation of this function is optional.
This functions is called as a callback upon each successful change to the parameters. If
one optimization step was successfull and the parameters (linked by
link_parameters
(*parameters)
) are changed, this callback function will be called. This callback may be used to
update precomputations for the kernel. Do not implement the
gradient updates here, as gradient updates are performed by the model enclosing
the kernel. In this example, we issue a no-op:
def parameters_changed(self):
# nothing todo here
pass
K
(self,X,X2)
¶
The implementation of this function in mandatory.
This function is used to compute the covariance matrix associated with
the inputs X, X2 (np.arrays with arbitrary number of lines,
\(n_1\), \(n_2\), corresponding to the number of samples over which to calculate covariance)
and self.input_dim
columns.
def K(self,X,X2):
if X2 is None: X2 = X
dist2 = np.square((X-X2.T)/self.lengthscale)
return self.variance*(1 + dist2/2.)**(-self.power)
Kdiag
(self,X)
¶
The implementation of this function is mandatory.
This function is similar to K
but it computes only the values of
the kernel on the diagonal. Thus, target
is a 1-dimensional
np.array of length \(n \times 1\).
def Kdiag(self,X):
return self.variance*np.ones(X.shape[0])
update_gradients_full
(self, dL_dK, X, X2=None)
¶
This function is required for the optimization of the parameters.
Computes the gradients and sets them on the parameters of this model. For example, if the kernel is parameterized by \(\sigma^2, \theta\), then
is added to the gradient of \(\sigma^2\): self.variance.gradient = <gradient>
and
to \(\theta\).
def update_gradients_full(self, dL_dK, X, X2):
if X2 is None: X2 = X
dist2 = np.square((X-X2.T)/self.lengthscale)
dvar = (1 + dist2/2.)**(-self.power)
dl = self.power * self.variance * dist2 * self.lengthscale**(-3) * (1 + dist2/2./self.power)**(-self.power-1)
dp = - self.variance * np.log(1 + dist2/2.) * (1 + dist2/2.)**(-self.power)
self.variance.gradient = np.sum(dvar*dL_dK)
self.lengthscale.gradient = np.sum(dl*dL_dK)
self.power.gradient = np.sum(dp*dL_dK)
update_gradients_diag
(self,dL_dKdiag,X,target)
¶
This function is required for BGPLVM, sparse models and uncertain inputs.
As previously, target is an self.num_params
array and
is set to each param
.
def update_gradients_diag(self, dL_dKdiag, X):
self.variance.gradient = np.sum(dL_dKdiag)
# here self.lengthscale and self.power have no influence on Kdiag so target[1:] are unchanged
gradients_X
(self,dL_dK, X, X2)
¶
This function is required for GPLVM, BGPLVM, sparse models and uncertain inputs.
Computes the derivative of the likelihood with respect to the inputs
X
(a \(n \times q\) np.array), that is, it calculates the quantity:
The partial derivative matrix is, in this case, comes out as an \(n \times q\) np.array.
def gradients_X(self,dL_dK,X,X2):
"""derivative of the likelihood with respect to X, calculated using dL_dK*dK_dX"""
if X2 is None: X2 = X
dist2 = np.square((X-X2.T)/self.lengthscale)
dK_dX = -self.variance*self.power * (X-X2.T)/self.lengthscale**2 * (1 + dist2/2./self.lengthscale)**(-self.power-1)
return np.sum(dL_dK*dK_dX,1)[:,None]
Were the number of parameters to be larger than 1 or the number of dimensions likewise any larger than 1, the calculated partial derivitive would be a 3- or 4-tensor.
gradients_X_diag
(self,dL_dKdiag,X)
¶
This function is required for BGPLVM, sparse models and uncertain
inputs. As for dKdiag_dtheta
,
is added to each element of target.
def gradients_X_diag(self,dL_dKdiag,X):
# no diagonal gradients
pass
Second order derivatives¶
These functions are required for the magnification factor and are the same as the first order gradients for X, but as the second order derivatives:
GPy.kern.src.kern.gradients_XX
(self,dL_dK, X, X2)
GPy.kern.src.kern.gradients_XX_diag
(self,dL_dKdiag, X)
Psi statistics¶
The psi statistics and their derivatives are required for BGPLVM and GPS with uncertain inputs only, the expressions are as follows
- psi0(self, Z, variational_posterior)
- \[\psi_0 = \sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]\]
- psi1(self, Z, variational_posterior)::
- \[\psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]\]
- psi2(self, Z, variational_posterior)
- \[\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]\]
- psi2n(self, Z, variational_posterior)
- \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Defining a new plotting function in GPy¶
GPy has a wrapper for different plotting backends.
There are some functions you can use for standard plotting.
Anything going beyond the scope of the
AbstractPlottingLibrary
classes plot definitions
should be considered carefully and maybe is a special case for your plotting library only.
All plotting related code lives in GPy.plotting
and beneath. No plotting related code needs to be
anywhere else in GPy.
As examples are always the easiest way to learn how to, we will implement an example of a plotting function, which plots the covariance of a kernel.
Write your plotting function into a module under GPy.plotting.gpy_plot
.<module_name>
using the plotting routines provided in GPy.plotting.plotting_library
.
I like to from . import plotting_library as pl
and the allways use pl().
to access functionality of
the plotting library.
For the covariance plot we define the function in GPy.plotting.kernel_plots
.
The first thing is to define the function parameters and write the documentation for them!
The first argument of the plotting function is always self
for the class this plotting function
will be attached to (we will get to attaching the function to a class that in detail later on):
def plot_covariance(kernel, x=None, label=None,
plot_limits=None, visible_dims=None, resolution=None,
projection=None, levels=20, **kwargs):
"""
Plot a kernel covariance w.r.t. another x.
:param array-like x: the value to use for the other kernel argument (kernels are a function of two variables!)
:param plot_limits: the range over which to plot the kernel
:type plot_limits: Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D
:param array-like visible_dims: input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
:resolution: the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.
:param {2d|3d} projection: What projection shall we use to plot the kernel?
:param int levels: for 2D projection, how many levels for the contour plot to use?
:param kwargs: valid kwargs for your specific plotting library
"""
Having defined the outline of the function we can start implementing the real plotting.
First, we will write the necessary logic behind getting the covariance function. This involves getting an Xgrid to plot with and the second x to compare the covariance to:
from .plot_util import helper_for_plot_data
X = np.ones((2, kernel.input_dim)) * [-4, 4]
_, free_dims, Xgrid, xx, yy, _, _, resolution = helper_for_plot_data(kernel, X, plot_limits, visible_dims, None, resolution)
from numbers import Number
if x is None:
x = np.zeros((1, kernel.input_dim))
elif isinstance(x, Number):
x = np.ones((1, kernel.input_dim))*x
K = kernel.K(Xgrid, x)
free_dims
holds the free dimensions after selecting
from the visible_dims, Xgrid
is the grid for the covariance,
xx, yy
are the grid positions for 2D plotting and x
is the
X2
for the kernel and K
holds the kernel covariance for
all positions between Xgrid
and x
.
Then we need a canvas to plot on. Always push the keyword arguments
of the specifig library through GPy.plotting.abstract_plotting_library.AbstractPlottingLibrary.new_canvas
:
if projection == '3d':
zlabel = "k(X, {!s})" % (np.asanyarray(x).tolist())
xlabel = 'X[:,0]'
ylabel = 'X[:,1]'
else:
xlabel = 'X'
ylabel = "k(X, {!s})" % (np.asanyarray(x).tolist())
canvas, kwargs = pl().new_canvas(projection=projection, xlabel=xlabel, ylabel=ylabel, zlabel=zlabel, **kwargs)
Also very important is to use the defaults, which are defined for all plotting libraries implemented.
This is done by updating the kwargs
from the defaults. There is a helper function
which takes care for existing keyword arguments. In this case we will just use the default for
plotting a mean function for the covariance plot as well. If you want to define your own defaults
add them to the defaults for each library and add it in here. See for example the defaults for
matplotlib in GPy.plotting.matplot_dep.defaults
. There is also the default for the
meanplot_1d, which we are for the 1d plot:
from .plot_util import update_not_existing_kwargs
update_not_existing_kwargs(kwargs, pl().defaults.meanplot_1d) # @UndefinedVariable
The full definition of the plotting then looks like this:
if len(free_dims)<=2:
if len(free_dims)==1:
# 1D plotting:
update_not_existing_kwargs(kwargs, pl().defaults.meanplot_1d) # @UndefinedVariable
plots = dict(covariance=[pl().plot(canvas, Xgrid[:, free_dims], K, label=label, **kwargs)])
else:
if projection == '2d':
update_not_existing_kwargs(kwargs, pl().defaults.meanplot_2d) # @UndefinedVariable
plots = dict(covariance=[pl().contour(canvas, xx[:, 0], yy[0, :],
K.reshape(resolution, resolution),
levels=levels, label=label, **kwargs)])
elif projection == '3d':
update_not_existing_kwargs(kwargs, pl().defaults.meanplot_3d) # @UndefinedVariable
plots = dict(covariance=[pl().surface(canvas, xx, yy,
K.reshape(resolution, resolution),
label=label,
**kwargs)])
return pl().add_to_canvas(canvas, plots)
else:
raise NotImplementedError("Cannot plot a kernel with more than two input dimensions")
Where we return whatever is returned by GPy.plotting.abstract_plotting_library.AbstractPlottingLibrary.add_to_canvas
,
so that the plotting library can choose what to do with the plot later, when we want to show it. In order
to show a plot, we can just call GPy.plotting.show
with the output of the plot above.
Now we want to add the plot to the GPy.kern.src.kern.Kern
. In order to do that, we inject the plotting function into the
class in the GPy.plotting.__init__
, which will make sure that the on the fly change of the backend
works smoothly. Thus, in GPy.plotting.__init__
we add the line:
from ..kern import Kern
Kern.plot_covariance = gpy_plot.kernel_plots.plot_covariance
And that’s it. The plot can be shown in plotly by calling:
GPy.plotting.change_plotting_library('plotly')
k = GPy.kern.RBF(1) + GPy.kern.Matern32(1)
k.randomize()
fig = k.plot()
GPy.plotting.show(fig, <plot_library specific **kwargs>)
k = GPy.kern.RBF(2) + GPy.kern.Matern32(2)
k.randomize()
fig = k.plot()
GPy.plotting.show(fig, <plot_library specific **kwargs>)
k = GPy.kern.RBF(1) + GPy.kern.Matern32(2)
k.randomize()
fig = k.plot(projection='3d')
GPy.plotting.show(fig, <plot_library specific **kwargs>)
This explains the next thing. Changing the backend works on-the-fly. To show the above example in matplotlib, we just
exchange the first line by GPy.plotting.change_plotting_library('matplotlib')
.
Parameterization handling¶
Parameterization in GPy is done through so called parameter handles. The parameter handles are handles to parameters of a model of any kind. A parameter handle can be constrained, fixed, randomized and others. All parameters in GPy have a name, with which they can be accessed in the model. The most common way of accesssing a parameter programmatically though, is by variable name.
Parameter handles¶
A parameter handle in GPy is a handle on a parameter, as the name suggests. A parameter can be constrained, fixed, randomized and more (See e.g. working with models). This gives the freedom to the model to handle parameter distribution and model updates as efficiently as possible. All parameter handles share a common memory space, which is just a flat numpy array, stored in the highest parent of a model hierarchy. In the following we will introduce and elucidate the different parameter handles which exist in GPy.
Parameterized
¶
A parameterized object itself holds parameter handles and is just a summarization of the parameters below. It can use those parameters to change the internal state of the model and GPy ensures those parameters to allways hold the right value when in an optimization routine or any other update.
GPy.core package¶
Introduction¶
This module contains the fundamental classes of GPy - classes that are inherited by objects in other parts of GPy in order to provide a consistent interface to major functionality.

GPy.core.model
is inherited by
GPy.core.gp.GP
. And GPy.core.model
itself
inherits paramz.model.Model
from the paramz
package. paramz essentially provides an inherited set of properties
and functions used to manage state (and state changes) of the model.
GPy.core.gp.GP
represents a GP model. Such an entity is
typically passed variables representing known (x) and observed (y)
data, along with a kernel and other information needed to create the
specific model. It exposes functions which return information derived
from the inputs to the model, for example predicting unobserved
variables based on new known variables, or the log marginal likelihood
of the current state of the model.
optimize
is called to optimize
hyperparameters of the model. The optimizer argument takes a string
which is used to specify non-default optimization schemes.
Various plotting functions can be called against GPy.core.gp.GP
.

GPy.core.gp.GP
is used as the basis for classes supporting
more specialized types of Gaussian Process model. These are however
generally still not specific enough to be called by the user and are
inhereted by members of the GPy.models
package.
-
randomize
(self, rand_gen=None, *args, **kwargs)[source]¶ Randomize the model. Make this draw from the prior if one exists, else draw from given random generator
Parameters: - rand_gen – np random number generator which takes args and kwargs
- loc (flaot) – loc parameter for random number generator
- scale (float) – scale parameter for random number generator
- kwargs (args,) – will be passed through to random number generator
Subpackages¶
GPy.core.parameterization package¶
Introduction¶
Extends the functionality of the paramz package (dependency) to support paramterization of priors (GPy.core.parameterization.priors
).

Submodules¶
GPy.core.parameterization.param module¶
-
class
Param
(name, input_array, default_constraint=None, *a, **kw)[source]¶ Bases:
paramz.param.Param
,GPy.core.parameterization.priorizable.Priorizable
-
randomize
(rand_gen=None, *args, **kwargs)¶ Randomize the model. Make this draw from the prior if one exists, else draw from given random generator
Parameters: - rand_gen – np random number generator which takes args and kwargs
- loc (flaot) – loc parameter for random number generator
- scale (float) – scale parameter for random number generator
- kwargs (args,) – will be passed through to random number generator
-
GPy.core.parameterization.parameterized module¶
-
class
Parameterized
(name=None, parameters=[])[source]¶ Bases:
paramz.parameterized.Parameterized
,GPy.core.parameterization.priorizable.Priorizable
Parameterized class
Say m is a handle to a parameterized class.
Printing parameters:
- print m: prints a nice summary over all parameters
- print m.name: prints details for param with name ‘name’
- print m[regexp]: prints details for all the parameters
- which match (!) regexp
- print m[‘’]: prints details for all parameters
Fields:
Name: The name of the param, can be renamed! Value: Shape or value, if one-valued Constrain: constraint of the param, curly “{c}” brackets indicate
some parameters are constrained by c. See detailed print to get exact constraints.Tied_to: which paramter it is tied to.
Getting and setting parameters:
Set all values in param to one:
m.name.to.param = 1Handling of constraining, fixing and tieing parameters:
You can constrain parameters by calling the constrain on the param itself, e.g:
- m.name[:,1].constrain_positive()
- m.name[0].tie_to(m.name[1])
Fixing parameters will fix them to the value they are right now. If you change the parameters value, the param will be fixed to the new value!
If you want to operate on all parameters use m[‘’] to wildcard select all paramters and concatenate them. Printing m[‘’] will result in printing of all parameters in detail.
-
randomize
(rand_gen=None, *args, **kwargs)¶ Randomize the model. Make this draw from the prior if one exists, else draw from given random generator
Parameters: - rand_gen – np random number generator which takes args and kwargs
- loc (flaot) – loc parameter for random number generator
- scale (float) – scale parameter for random number generator
- kwargs (args,) – will be passed through to random number generator
GPy.core.parameterization.priorizable module¶
-
class
Priorizable
(name, default_prior=None, *a, **kw)[source]¶ Bases:
paramz.core.parameter_core.Parameterizable
GPy.core.parameterization.priors module¶
-
class
DGPLVM
(sigma2, lbl, x_shape)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the Discriminative Gaussian Process Latent Variable model paper, by Raquel.
Parameters: sigma2 – constant Note
DGPLVM for Classification paper implementation
-
domain
= 'real'¶
-
-
class
DGPLVM_KFDA
(lambdaa, sigma2, lbl, kern, x_shape)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the Discriminative Gaussian Process Latent Variable function using Kernel Fisher Discriminant Analysis by Seung-Jean Kim for implementing Face paper by Chaochao Lu.
Parameters: - lambdaa – constant
- sigma2 – constant
Note
Surpassing Human-Level Face paper dgplvm implementation
A description for init
-
domain
= 'real'¶
-
class
DGPLVM_Lamda
(sigma2, lbl, x_shape, lamda, name='DP_prior')[source]¶ Bases:
GPy.core.parameterization.priors.Prior
,GPy.core.parameterization.parameterized.Parameterized
Implementation of the Discriminative Gaussian Process Latent Variable model paper, by Raquel.
Parameters: sigma2 – constant Note
DGPLVM for Classification paper implementation
-
domain
= 'real'¶
-
-
class
DGPLVM_T
(sigma2, lbl, x_shape, vec)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the Discriminative Gaussian Process Latent Variable model paper, by Raquel.
Parameters: sigma2 – constant Note
DGPLVM for Classification paper implementation
-
domain
= 'real'¶
-
-
class
Exponential
(l)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the Exponential probability function, coupled with random variables.
Parameters: l – shape parameter -
domain
= 'positive'¶
-
-
class
Gamma
(a, b)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the Gamma probability function, coupled with random variables.
Parameters: - a – shape parameter
- b – rate parameter (warning: it’s the inverse of the scale)
Note
Bishop 2006 notation is used throughout the code
-
static
from_EV
(E, V)[source]¶ Creates an instance of a Gamma Prior by specifying the Expected value(s) and Variance(s) of the distribution.
Parameters: - E – expected value
- V – variance
-
a
¶
-
b
¶
-
domain
= 'positive'¶
-
class
Gaussian
(mu, sigma)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the univariate Gaussian probability function, coupled with random variables.
Parameters: - mu – mean
- sigma – standard deviation
Note
Bishop 2006 notation is used throughout the code
-
domain
= 'real'¶
-
class
HalfT
(A, nu)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the half student t probability function, coupled with random variables.
Parameters: - A – scale parameter
- nu – degrees of freedom
-
domain
= 'positive'¶
-
class
InverseGamma
(a, b)[source]¶ Bases:
GPy.core.parameterization.priors.Gamma
Implementation of the inverse-Gamma probability function, coupled with random variables.
Parameters: - a – shape parameter
- b – rate parameter (warning: it’s the inverse of the scale)
Note
Bishop 2006 notation is used throughout the code
-
static
from_EV
(E, V)[source]¶ Creates an instance of a Gamma Prior by specifying the Expected value(s) and Variance(s) of the distribution.
Parameters: - E – expected value
- V – variance
-
domain
= 'positive'¶
-
class
LogGaussian
(mu, sigma)[source]¶ Bases:
GPy.core.parameterization.priors.Gaussian
Implementation of the univariate log-Gaussian probability function, coupled with random variables.
Parameters: - mu – mean
- sigma – standard deviation
Note
Bishop 2006 notation is used throughout the code
-
domain
= 'positive'¶
-
class
MultivariateGaussian
(mu, var)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the multivariate Gaussian probability function, coupled with random variables.
Parameters: - mu – mean (N-dimensional array)
- var – covariance matrix (NxN)
Note
Bishop 2006 notation is used throughout the code
-
domain
= 'real'¶
-
class
StudentT
(mu, sigma, nu)[source]¶ Bases:
GPy.core.parameterization.priors.Prior
Implementation of the student t probability function, coupled with random variables.
Parameters: - mu – mean
- sigma – standard deviation
- nu – degrees of freedom
Note
Bishop 2006 notation is used throughout the code
-
domain
= 'real'¶
GPy.core.parameterization.transformations module¶
GPy.core.parameterization.variational module¶
Created on 6 Nov 2013
@author: maxz
-
class
NormalPosterior
(means=None, variances=None, name='latent space', *a, **kw)[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPosterior
NormalPosterior distribution for variational approximations.
holds the means and variances for a factorizing multivariate normal distribution
-
class
NormalPrior
(name='normal_prior', **kw)[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPrior
-
class
SpikeAndSlabPosterior
(means, variances, binary_prob, group_spike=False, sharedX=False, name='latent space')[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPosterior
The SpikeAndSlab distribution for variational approximations.
binary_prob : the probability of the distribution on the slab part.
-
class
SpikeAndSlabPrior
(pi=None, learnPi=False, variance=1.0, group_spike=False, name='SpikeAndSlabPrior', **kw)[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPrior
-
class
VariationalPosterior
(means=None, variances=None, name='latent space', *a, **kw)[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
Submodules¶
GPy.core.gp module¶
-
class
GP
(X, Y, kernel, likelihood, mean_function=None, inference_method=None, name='gp', Y_metadata=None, normalizer=False)[source]¶ Bases:
GPy.core.model.Model
General purpose Gaussian process model
Parameters: - X – input observations
- Y – output observations
- kernel – a GPy kernel
- likelihood – a GPy likelihood
- inference_method – The
LatentFunctionInference
inference method to use for this GP - normalizer (Norm) – normalize the outputs Y. Prediction will be un-normalized using this normalizer. If normalizer is True, we will normalize using Standardize. If normalizer is False, no normalization will be done.
Return type: model object
Note
Multiple independent outputs are allowed using columns of Y
-
infer_newX
(Y_new, optimize=True)[source]¶ Infer X for the new observed data Y_new.
Parameters: - Y_new (numpy.ndarray) – the new observed data for inference
- optimize (boolean) – whether to optimize the location of new X (True by default)
Returns: a tuple containing the posterior estimation of X and the model that optimize X
Return type: (
VariationalPosterior
and numpy.ndarray,Model
)
-
log_likelihood
()[source]¶ The log marginal likelihood of the model, \(p(\mathbf{y})\), this is the objective function of the model being optimised
-
log_predictive_density
(x_test, y_test, Y_metadata=None)[source]¶ Calculation of the log predictive density
Parameters: - x_test ((Nx1) array) – test locations (x_{*})
- y_test ((Nx1) array) – test observations (y_{*})
- Y_metadata – metadata associated with the test points
-
log_predictive_density_sampling
(x_test, y_test, Y_metadata=None, num_samples=1000)[source]¶ Calculation of the log predictive density by sampling
Parameters: - x_test ((Nx1) array) – test locations (x_{*})
- y_test ((Nx1) array) – test observations (y_{*})
- Y_metadata – metadata associated with the test points
- num_samples (int) – number of samples to use in monte carlo integration
-
optimize
(optimizer=None, start=None, messages=False, max_iters=1000, ipython_notebook=True, clear_after_finish=False, **kwargs)[source]¶ Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors. kwargs are passed to the optimizer. They can be:
Parameters: - max_iters (int) – maximum number of function evaluations
- messages (bool) – whether to display during optimisation
- optimizer (string) – which optimizer to use (defaults to self.preferred optimizer), a range of optimisers can be found in :module:`~GPy.inference.optimization`, they include ‘scg’, ‘lbfgs’, ‘tnc’.
- ipython_notebook (bool) – whether to use ipython notebook widgets or not.
- clear_after_finish (bool) – if in ipython notebook, we can clear the widgets after optimization.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)¶ Convenience function for plotting the fit of a GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
If you want fine graned control use the specific plotting functions supplied in the model.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- projection ({2d|3d}) – plot in 2d or 3d?
- legend (bool) – convenience, whether to put a legend on the plot or not.
-
plot_confidence
(lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_data
(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)¶ - Plot the training data
- For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_data_error
(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)¶ Plot the training data input error.
For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_density
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_errorbars_trainset
(which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)¶ Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.
This also works for heteroscedastic likelihoods.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols – when the data has several columns (independant outputs), only plot these
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_f
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_latent
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_magnification
(labels=None, which_indices=None, resolution=60, marker='<>^vsd', legend=True, plot_limits=None, updates=False, mean=True, covariance=True, kern=None, num_samples=1000, scatter_kwargs=None, plot_scatter=True, **imshow_kwargs)¶ Plot the magnification factor of the GP on the inputs. This is the density of the GP as a gray scale.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- marker (str) – markers to use - cycle if more labels then markers are given
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- mean (bool) – use the mean of the Wishart embedding for the magnification factor
- covariance (bool) – use the covariance of the Wishart embedding for the magnification factor
- kern (
Kern
) – the kernel to use for prediction - num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- kwargs – the kwargs for the scatter plots
-
plot_mean
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – for 2D plotting, the number of contour levels to use is
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_noiseless
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_samples
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- levels (int) – for 2D plotting, the number of contour levels to use is
-
posterior_covariance_between_points
(X1, X2, Y_metadata=None, likelihood=None, include_likelihood=True)[source]¶ Computes the posterior covariance between points. Includes likelihood variance as well as normalization so that evaluation at (x,x) is consistent with model.predict
Parameters: - X1 – some input observations
- X2 – other input observations
- Y_metadata – metadata about the predicting point to pass to the likelihood
- include_likelihood (bool) – Whether or not to add likelihood noise to the predicted underlying latent function f.
Returns: cov: posterior covariance, a Numpy array, Nnew x Nnew if self.output_dim == 1, and Nnew x Nnew x self.output_dim otherwise.
-
posterior_samples
(X, size=10, Y_metadata=None, likelihood=None, **predict_kwargs)[source]¶ Samples the posterior GP at the points X.
Parameters: - X (np.ndarray (Nnew x self.input_dim.)) – the points at which to take the samples.
- size (int.) – the number of a posteriori samples.
- noise_model (integer.) – for mixed noise likelihood, the noise model to use in the samples.
Returns: Ysim: set of simulations,
Return type: np.ndarray (D x N x samples) (if D==1 we flatten out the first dimension)
-
posterior_samples_f
(X, size=10, **predict_kwargs)[source]¶ Samples the posterior GP at the points X.
Parameters: - X (np.ndarray (Nnew x self.input_dim)) – The points at which to take the samples.
- size (int.) – the number of a posteriori samples.
Returns: set of simulations
Return type: np.ndarray (Nnew x D x samples)
-
predict
(Xnew, full_cov=False, Y_metadata=None, kern=None, likelihood=None, include_likelihood=True)[source]¶ Predict the function(s) at the new point(s) Xnew. This includes the likelihood variance added to the predicted underlying function (usually referred to as f).
In order to predict without adding in the likelihood give include_likelihood=False, or refer to self.predict_noiseless().
Parameters: - Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
- full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
- Y_metadata – metadata about the predicting point to pass to the likelihood
- kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
- include_likelihood (bool) – Whether or not to add likelihood noise to the predicted underlying latent function f.
Returns: (mean, var): mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False,
Nnew x Nnew otherwise
If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.
Note: If you want the predictive quantiles (e.g. 95% confidence interval) use
predict_quantiles
.
-
predict_jacobian
(Xnew, kern=None, full_cov=False)[source]¶ Compute the derivatives of the posterior of the GP.
Given a set of points at which to predict X* (size [N*,Q]), compute the mean and variance of the derivative. Resulting arrays are sized:
- dL_dX* – [N*, Q ,D], where D is the number of output in this GP (usually one).
- Note that this is the mean and variance of the derivative, not the derivative of the mean and variance! (See predictive_gradients for that)
- dv_dX* – [N*, Q], (since all outputs have the same variance)
- If there is missing data, it is not implemented for now, but there will be one output variance per output dimension.
Parameters: - X (np.ndarray (Xnew x self.input_dim)) – The points at which to get the predictive gradients.
- kern – The kernel to compute the jacobian for.
- full_cov (boolean) – whether to return the cross-covariance terms between
the N* Jacobian vectors
Returns: dmu_dX, dv_dX Return type: [np.ndarray (N*, Q ,D), np.ndarray (N*,Q,(D)) ]
-
predict_magnification
(Xnew, kern=None, mean=True, covariance=True, dimensions=None)[source]¶ Predict the magnification factor as
sqrt(det(G))
for each point N in Xnew.
Parameters: - mean (bool) – whether to include the mean of the wishart embedding.
- covariance (bool) – whether to include the covariance of the wishart embedding.
- dimensions (array-like) – which dimensions of the input space to use [defaults to self.get_most_significant_input_dimensions()[:2]]
-
predict_noiseless
(Xnew, full_cov=False, Y_metadata=None, kern=None)[source]¶ Convenience function to predict the underlying function of the GP (often referred to as f) without adding the likelihood variance on the prediction function.
This is most likely what you want to use for your predictions.
Parameters: - Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
- full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
- Y_metadata – metadata about the predicting point to pass to the likelihood
- kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
Returns: - (mean, var):
mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False, Nnew x Nnew otherwise
If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.
Note: If you want the predictive quantiles (e.g. 95% confidence interval) use
predict_quantiles
.
-
predict_quantiles
(X, quantiles=(2.5, 97.5), Y_metadata=None, kern=None, likelihood=None)[source]¶ Get the predictive quantiles around the prediction at X
Parameters: - X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
- quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
- kern – optional kernel to use for prediction
Returns: list of quantiles for each X and predictive quantiles for interval combination
Return type: [np.ndarray (Xnew x self.output_dim), np.ndarray (Xnew x self.output_dim)]
-
predict_wishart_embedding
(Xnew, kern=None, mean=True, covariance=True)[source]¶ Predict the wishart embedding G of the GP. This is the density of the input of the GP defined by the probabilistic function mapping f. G = J_mean.T*J_mean + output_dim*J_cov.
Parameters: Xnew (array-like) – The points at which to evaluate the magnification. :param
Kern
kern: The kernel to use for the magnification.Supplying only a part of the learning kernel gives insights into the density of the specific kernel part of the input function. E.g. one can see how dense the linear part of a kernel is compared to the non-linear part etc.
-
predictive_gradients
(Xnew, kern=None)[source]¶ Compute the derivatives of the predicted latent function with respect to X*
Given a set of points at which to predict X* (size [N*,Q]), compute the derivatives of the mean and variance. Resulting arrays are sized:
dmu_dX* – [N*, Q ,D], where D is the number of output in this GP (usually one).Note that this is not the same as computing the mean and variance of the derivative of the function!
dv_dX* – [N*, Q], (since all outputs have the same variance)Parameters: X (np.ndarray (Xnew x self.input_dim)) – The points at which to get the predictive gradients Returns: dmu_dX, dv_dX Return type: [np.ndarray (N*, Q ,D), np.ndarray (N*,Q) ]
-
set_XY
(X=None, Y=None)[source]¶ Set the input / output data of the model This is useful if we wish to change our existing data but maintain the same model
Parameters: - X (np.ndarray) – input observations
- Y (np.ndarray) – output observations
-
to_dict
(save_data=True)[source]¶ Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent.
Parameters: save_data (boolean) – if true, it adds the training data self.X and self.Y to the dictionary Return dict: json serializable dictionary containing the needed information to instantiate the object
-
input_dim
¶
-
num_data
¶
GPy.core.gp_grid module¶
-
class
GpGrid
(X, Y, kernel, likelihood, inference_method=None, name='gp grid', Y_metadata=None, normalizer=False)[source]¶ Bases:
GPy.core.gp.GP
A GP model for Grid inputs
Parameters: - X (np.ndarray (num_data x input_dim)) – inputs
- likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
- kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method reperforms inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.core.mapping module¶
-
class
Bijective_mapping
(input_dim, output_dim, name='bijective_mapping')[source]¶ Bases:
GPy.core.mapping.Mapping
This is a mapping that is bijective, i.e. you can go from X to f and also back from f to X. The inverse mapping is called g().
-
class
Mapping
(input_dim, output_dim, name='mapping')[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
Base model for shared mapping behaviours
-
static
from_dict
(input_dict)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
static
GPy.core.model module¶
-
class
Model
(name)[source]¶ Bases:
paramz.model.Model
,GPy.core.parameterization.priorizable.Priorizable
-
static
from_dict
(input_dict, data=None)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
objective_function
()[source]¶ The objective function for the given algorithm.
This function is the true objective, which wants to be minimized. Note that all parameters are already set and in place, so you just need to return the objective function here.
For probabilistic models this is the negative log_likelihood (including the MAP prior), so we return it here. If your model is not probabilistic, just return your objective to minimize here!
-
objective_function_gradients
()[source]¶ The gradients for the objective function for the given algorithm. The gradients are w.r.t. the negative objective function, as this framework works with negative log-likelihoods as a default.
You can find the gradient for the parameters in self.gradient at all times. This is the place, where gradients get stored for parameters.
This function is the true objective, which wants to be minimized. Note that all parameters are already set and in place, so you just need to return the gradient here.
For probabilistic models this is the gradient of the negative log_likelihood (including the MAP prior), so we return it here. If your model is not probabilistic, just return your negative gradient here!
-
randomize
(rand_gen=None, *args, **kwargs)¶ Randomize the model. Make this draw from the prior if one exists, else draw from given random generator
Parameters: - rand_gen – np random number generator which takes args and kwargs
- loc (flaot) – loc parameter for random number generator
- scale (float) – scale parameter for random number generator
- kwargs (args,) – will be passed through to random number generator
-
static
GPy.core.sparse_gp module¶
-
class
SparseGP
(X, Y, Z, kernel, likelihood, mean_function=None, X_variance=None, inference_method=None, name='sparse gp', Y_metadata=None, normalizer=False)[source]¶ Bases:
GPy.core.gp.GP
A general purpose Sparse GP model
This model allows (approximate) inference using variational DTC or FITC (Gaussian likelihoods) as well as non-conjugate sparse methods based on these.
This is not for missing data, as the implementation for missing data involves some inefficient optimization routine decisions. See missing data SparseGP implementation in py:class:’~GPy.models.sparse_gp_minibatch.SparseGPMiniBatch’.
Parameters: - X (np.ndarray (num_data x input_dim)) – inputs
- likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
- kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
- X_variance (np.ndarray (num_data x input_dim) | None) – The uncertainty in the measurements of X (Gaussian variance)
- Z (np.ndarray (num_inducing x input_dim)) – inducing inputs
- num_inducing (int) – Number of inducing points (optional, default 10. Ignored if Z is not None)
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot_inducing
(visible_dims=None, projection='2d', label='inducing', legend=True, **plot_kwargs)¶ Plot the inducing inputs of a sparse gp model
Parameters: - visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- plot_kwargs (kwargs) – keyword arguments for the plotting library
GPy.core.sparse_gp_mpi module¶
-
class
SparseGP_MPI
(X, Y, Z, kernel, likelihood, variational_prior=None, mean_function=None, inference_method=None, name='sparse gp', Y_metadata=None, mpi_comm=None, normalizer=False)[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
A general purpose Sparse GP model with MPI parallelization support
This model allows (approximate) inference using variational DTC or FITC (Gaussian likelihoods) as well as non-conjugate sparse methods based on these.
Parameters: - X (np.ndarray (num_data x input_dim)) – inputs
- likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
- kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
- X_variance (np.ndarray (num_data x input_dim) | None) – The uncertainty in the measurements of X (Gaussian variance)
- Z (np.ndarray (num_inducing x input_dim)) – inducing inputs
- num_inducing (int) – Number of inducing points (optional, default 10. Ignored if Z is not None)
- mpi_comm (mpi4py.MPI.Intracomm) – The communication group of MPI, e.g. mpi4py.MPI.COMM_WORLD
-
optimize
(optimizer=None, start=None, **kwargs)[source]¶ Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors. kwargs are passed to the optimizer. They can be:
Parameters: - max_iters (int) – maximum number of function evaluations
- messages (bool) – whether to display during optimisation
- optimizer (string) – which optimizer to use (defaults to self.preferred optimizer), a range of optimisers can be found in :module:`~GPy.inference.optimization`, they include ‘scg’, ‘lbfgs’, ‘tnc’.
- ipython_notebook (bool) – whether to use ipython notebook widgets or not.
- clear_after_finish (bool) – if in ipython notebook, we can clear the widgets after optimization.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
optimizer_array
¶ Array for the optimizer to work on. This array always lives in the space for the optimizer. Thus, it is untransformed, going from Transformations.
Setting this array, will make sure the transformed parameters for this model will be set accordingly. It has to be set with an array, retrieved from this method, as e.g. fixing will resize the array.
The optimizer should only interfere with this array, such that transformations are secured.
GPy.core.svgp module¶
-
class
SVGP
(X, Y, Z, kernel, likelihood, mean_function=None, name='SVGP', Y_metadata=None, batchsize=None, num_latent_functions=None)[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
Stochastic Variational GP.
For Gaussian Likelihoods, this implements
Gaussian Processes for Big data, Hensman, Fusi and Lawrence, UAI 2013,
But without natural gradients. We’ll use the lower-triangluar representation of the covariance matrix to ensure positive-definiteness.
For Non Gaussian Likelihoods, this implements
Hensman, Matthews and Ghahramani, Scalable Variational GP Classification, ArXiv 1411.2005
-
new_batch
()[source]¶ Return a new batch of X and Y by taking a chunk of data from the complete X and Y
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
GPy.core.symbolic module¶
GPy.models package¶
Introduction¶
This package principally contains classes ultimately inherited from GPy.core.gp.GP
intended as models for end user consuption - much of GPy.core.gp.GP
is not intended to be called directly. The general form of a “model” is a function that takes some data, a kernel (see GPy.kern
) and other parameters, returning an object representation.
Several models directly inherit GPy.core.gp.GP
:

Some models fall into conceptually related groups of models (e.g. GPy.core.sparse_gp
, GPy.core.sparse_gp_mpi
):

In some cases one end-user model inherits another e.g.

Submodules¶
GPy.models.bayesian_gplvm module¶
-
class
BayesianGPLVM
(Y, input_dim, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', mpi_comm=None, normalizer=None, missing_data=False, stochastic=False, batchsize=1, Y_metadata=None)[source]¶ Bases:
GPy.core.sparse_gp_mpi.SparseGP_MPI
Bayesian Gaussian Process Latent Variable Model
Parameters: - Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
- input_dim (int) – latent dimensionality
- init ('PCA'|'random') – initialisation method for the latent space
-
get_X_gradients
(X)[source]¶ Get the gradients of the posterior distribution of X in its specific form.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot_inducing
(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)¶ Plot a scatter plot of the inducing inputs.
Parameters: - which_indices ([int]) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – marker to use [default is custom arrow like]
- kwargs – the kwargs for the scatter plots
- projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
-
plot_latent
(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- scatter_kwargs – the kwargs for the scatter plots
-
plot_scatter
(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)¶ Plot a scatter plot of the latent space.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – markers to use - cycle if more labels then markers are given
- kwargs – the kwargs for the scatter plots
-
plot_steepest_gradient_map
(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- annotation_kwargs – the kwargs for the annotation plot
- scatter_kwargs – the kwargs for the scatter plots
GPy.models.bayesian_gplvm_minibatch module¶
-
class
BayesianGPLVMMiniBatch
(Y, input_dim, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', normalizer=None, missing_data=False, stochastic=False, batchsize=1)[source]¶ Bases:
GPy.models.sparse_gp_minibatch.SparseGPMiniBatch
Bayesian Gaussian Process Latent Variable Model
Parameters: - Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
- input_dim (int) – latent dimensionality
- init ('PCA'|'random') – initialisation method for the latent space
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot_inducing
(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)¶ Plot a scatter plot of the inducing inputs.
Parameters: - which_indices ([int]) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – marker to use [default is custom arrow like]
- kwargs – the kwargs for the scatter plots
- projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
-
plot_latent
(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- scatter_kwargs – the kwargs for the scatter plots
-
plot_scatter
(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)¶ Plot a scatter plot of the latent space.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – markers to use - cycle if more labels then markers are given
- kwargs – the kwargs for the scatter plots
-
plot_steepest_gradient_map
(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- annotation_kwargs – the kwargs for the annotation plot
- scatter_kwargs – the kwargs for the scatter plots
GPy.models.bcgplvm module¶
-
class
BCGPLVM
(Y, input_dim, kernel=None, mapping=None)[source]¶ Bases:
GPy.models.gplvm.GPLVM
Back constrained Gaussian Process Latent Variable Model
Parameters: - Y (np.ndarray) – observed data
- input_dim (int) – latent dimensionality
- mapping (GPy.core.Mapping object) – mapping for back constraint
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.dpgplvm module¶
-
class
DPBayesianGPLVM
(Y, input_dim, X_prior, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', mpi_comm=None, normalizer=None, missing_data=False, stochastic=False, batchsize=1)[source]¶ Bases:
GPy.models.bayesian_gplvm.BayesianGPLVM
Bayesian Gaussian Process Latent Variable Model with Descriminative prior
GPy.models.gp_classification module¶
-
class
GPClassification
(X, Y, kernel=None, Y_metadata=None, mean_function=None, inference_method=None, likelihood=None, normalizer=False)[source]¶ Bases:
GPy.core.gp.GP
Gaussian Process classification
This is a thin wrapper around the models.GP class, with a set of sensible defaults
Parameters: - X – input observations
- Y – observed values, can be None if likelihood is not None
- kernel – a GPy kernel, defaults to rbf
- likelihood – a GPy likelihood, defaults to Bernoulli
- inference_method (
GPy.inference.latent_function_inference.LatentFunctionInference
) – Latent function inference to use, defaults to EP
Note
Multiple independent outputs are allowed using columns of Y
-
static
from_dict
(input_dict, data=None)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
to_dict
(save_data=True)[source]¶ Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent.
Parameters: save_data (boolean) – if true, it adds the training data self.X and self.Y to the dictionary Return dict: json serializable dictionary containing the needed information to instantiate the object
GPy.models.gp_coregionalized_regression module¶
-
class
GPCoregionalizedRegression
(X_list, Y_list, kernel=None, likelihoods_list=None, name='GPCR', W_rank=1, kernel_name='coreg')[source]¶ Bases:
GPy.core.gp.GP
Gaussian Process model for heteroscedastic multioutput regression
This is a thin wrapper around the models.GP class, with a set of sensible defaults
Parameters: - X_list (list of numpy arrays) – list of input observations corresponding to each output
- Y_list (list of numpy arrays) – list of observed values related to the different noise models
- kernel (None | GPy.kernel defaults) – a GPy kernel ** Coregionalized, defaults to RBF ** Coregionalized
- name (string) – model name
- W_rank (integer) – number tuples of the corregionalization parameters ‘W’ (see coregionalize kernel documentation)
- kernel_name (string) – name of the kernel
Likelihoods_list: a list of likelihoods, defaults to list of Gaussian likelihoods
GPy.models.gp_grid_regression module¶
-
class
GPRegressionGrid
(X, Y, kernel=None, Y_metadata=None, normalizer=None)[source]¶ Bases:
GPy.core.gp_grid.GpGrid
Gaussian Process model for grid inputs using Kronecker products
This is a thin wrapper around the models.GpGrid class, with a set of sensible defaults
Parameters: - X – input observations
- Y – observed values
- kernel – a GPy kernel, defaults to the kron variation of SqExp
- normalizer (Norm) –
[False]
Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)
Note
Multiple independent outputs are allowed using columns of Y
GPy.models.gp_heteroscedastic_regression module¶
-
class
GPHeteroscedasticRegression
(X, Y, kernel=None, Y_metadata=None)[source]¶ Bases:
GPy.core.gp.GP
Gaussian Process model for heteroscedastic regression
This is a thin wrapper around the models.GP class, with a set of sensible defaults
Parameters: - X – input observations
- Y – observed values
- kernel – a GPy kernel, defaults to rbf
NB: This model does not make inference on the noise outside the training set
GPy.models.gp_kronecker_gaussian_regression module¶
-
class
GPKroneckerGaussianRegression
(X1, X2, Y, kern1, kern2, noise_var=1.0, name='KGPR')[source]¶ Bases:
GPy.core.model.Model
Kronecker GP regression
Take two kernels computed on separate spaces K1(X1), K2(X2), and a data matrix Y which is f size (N1, N2).
The effective covaraince is np.kron(K2, K1) The effective data is vec(Y) = Y.flatten(order=’F’)
The noise must be iid Gaussian.
See [stegle_et_al_2011].
References
[stegle_et_al_2011] Stegle, O.; Lippert, C.; Mooij, J.M.; Lawrence, N.D.; Borgwardt, K.:Efficient inference in matrix-variate Gaussian models with iid observation noise. In: Advances in Neural Information Processing Systems, 2011, Pages 630-638 -
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
predict
(X1new, X2new)[source]¶ Return the predictive mean and variance at a series of new points X1new, X2new Only returns the diagonal of the predictive variance, for now.
Parameters: - X1new (np.ndarray, Nnew x self.input_dim1) – The points at which to make a prediction
- X2new (np.ndarray, Nnew x self.input_dim2) – The points at which to make a prediction
-
GPy.models.gp_multiout_regression module¶
-
class
GPMultioutRegression
(X, Y, Xr_dim, kernel=None, kernel_row=None, Z=None, Z_row=None, X_row=None, Xvariance_row=None, num_inducing=(10, 10), qU_var_r_W_dim=None, qU_var_c_W_dim=None, init='GP', name='GPMR')[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
Gaussian Process model for multi-output regression without missing data
This is an implementation of Latent Variable Multiple Output Gaussian Processes (LVMOGP) in [Dai_et_al_2017].
References
[Dai_et_al_2017] Dai, Z.; Alvarez, M.A.; Lawrence, N.D: Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes. In NIPS, 2017. Parameters: - X (numpy.ndarray) – input observations.
- Y (numpy.ndarray) – output observations, each column corresponding to an output dimension.
- Xr_dim (int) – the dimensionality of a latent space, in which output dimensions are embedded in
- kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF **
- kernel_row (GPy.kern.Kern or None) – a GPy kernel for the GP of the latent space ** defaults to RBF **
- Z (numpy.ndarray or None) – inducing inputs
- Z_row (numpy.ndarray or None) – inducing inputs for the latent space
- X_row (numpy.ndarray or None) – the initial value of the mean of the variational posterior distribution of points in the latent space
- Xvariance_row (numpy.ndarray or None) – the initial value of the variance of the variational posterior distribution of points in the latent space
- num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space.
- qU_var_r_W_dim (int) – the dimensionality of the covariance of q(U) for the latent space. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
- qU_var_c_W_dim (int) – the dimensionality of the covariance of q(U) for the GP regression. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
- init (str) – the choice of initialization: ‘GP’ or ‘rand’. With ‘rand’, the model is initialized randomly. With ‘GP’, the model is initialized through a protocol as follows: (1) fits a sparse GP (2) fits a BGPLVM based on the outcome of sparse GP (3) initialize the model based on the outcome of the BGPLVM.
- name (str) – the name of the model
-
optimize_auto
(max_iters=10000, verbose=True)[source]¶ Optimize the model parameters through a pre-defined protocol.
Parameters: - max_iters (int) – the maximum number of iterations.
- verbose (boolean) – print the progress of optimization or not.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.gp_multiout_regression_md module¶
-
class
GPMultioutRegressionMD
(X, Y, indexD, Xr_dim, kernel=None, kernel_row=None, Z=None, Z_row=None, X_row=None, Xvariance_row=None, num_inducing=(10, 10), qU_var_r_W_dim=None, qU_var_c_W_dim=None, init='GP', heter_noise=False, name='GPMRMD')[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
Gaussian Process model for multi-output regression with missing data
This is an implementation of Latent Variable Multiple Output Gaussian Processes (LVMOGP) in [Dai_et_al_2017]. This model targets at the use case, in which each output dimension is observed at a different set of inputs. The model takes a different data format: the inputs and outputs observations of all the output dimensions are stacked together correspondingly into two matrices. An extra array is used to indicate the index of output dimension for each data point. The output dimensions are indexed using integers from 0 to D-1 assuming there are D output dimensions.
References
[Dai_et_al_2017] Dai, Z.; Alvarez, M.A.; Lawrence, N.D: Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes. In NIPS, 2017. Parameters: - X (numpy.ndarray) – input observations.
- Y (numpy.ndarray) – output observations, each column corresponding to an output dimension.
- indexD (numpy.ndarray) – the array containing the index of output dimension for each data point
- Xr_dim (int) – the dimensionality of a latent space, in which output dimensions are embedded in
- kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF **
- kernel_row (GPy.kern.Kern or None) – a GPy kernel for the GP of the latent space ** defaults to RBF **
- Z (numpy.ndarray or None) – inducing inputs
- Z_row (numpy.ndarray or None) – inducing inputs for the latent space
- X_row (numpy.ndarray or None) – the initial value of the mean of the variational posterior distribution of points in the latent space
- Xvariance_row (numpy.ndarray or None) – the initial value of the variance of the variational posterior distribution of points in the latent space
- num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space.
- qU_var_r_W_dim (int) – the dimensionality of the covariance of q(U) for the latent space. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
- qU_var_c_W_dim (int) – the dimensionality of the covariance of q(U) for the GP regression. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
- init (str) – the choice of initialization: ‘GP’ or ‘rand’. With ‘rand’, the model is initialized randomly. With ‘GP’, the model is initialized through a protocol as follows: (1) fits a sparse GP (2) fits a BGPLVM based on the outcome of sparse GP (3) initialize the model based on the outcome of the BGPLVM.
- heter_noise (boolean) – whether assuming heteroscedastic noise in the model, boolean
- name (str) – the name of the model
-
optimize_auto
(max_iters=10000, verbose=True)[source]¶ Optimize the model parameters through a pre-defined protocol.
Parameters: - max_iters (int) – the maximum number of iterations.
- verbose (boolean) – print the progress of optimization or not.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.gp_offset_regression module¶
-
class
GPOffsetRegression
(X, Y, kernel=None, Y_metadata=None, normalizer=None, noise_var=1.0, mean_function=None)[source]¶ Bases:
GPy.core.gp.GP
Gaussian Process model for offset regression
Parameters: - X – input observations, we assume for this class that this has one dimension of actual inputs and the last dimension should be the index of the cluster (so X should be Nx2)
- Y – observed values (Nx1?)
- kernel – a GPy kernel, defaults to rbf
- normalizer (Norm) – [False]
- noise_var –
the noise variance for Gaussian likelhood, defaults to 1.
Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)
Note
Multiple independent outputs are allowed using columns of Y
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.gp_regression module¶
-
class
GPRegression
(X, Y, kernel=None, Y_metadata=None, normalizer=None, noise_var=1.0, mean_function=None)[source]¶ Bases:
GPy.core.gp.GP
Gaussian Process model for regression
This is a thin wrapper around the models.GP class, with a set of sensible defaults
Parameters: - X – input observations
- Y – observed values
- kernel – a GPy kernel, defaults to rbf
- normalizer (Norm) – [False]
- noise_var –
the noise variance for Gaussian likelhood, defaults to 1.
Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)
Note
Multiple independent outputs are allowed using columns of Y
-
to_dict
(save_data=True)[source]¶ Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent.
Parameters: save_data (boolean) – if true, it adds the training data self.X and self.Y to the dictionary Return dict: json serializable dictionary containing the needed information to instantiate the object
GPy.models.gp_var_gauss module¶
-
class
GPVariationalGaussianApproximation
(X, Y, kernel, likelihood, Y_metadata=None)[source]¶ Bases:
GPy.core.gp.GP
The Variational Gaussian Approximation revisited
References
[opper_archambeau_2009] Opper, M.; Archambeau, C.; The Variational Gaussian Approximation Revisited. Neural Comput. 2009, pages 786-792.
GPy.models.gplvm module¶
-
class
GPLVM
(Y, input_dim, init='PCA', X=None, kernel=None, name='gplvm', Y_metadata=None, normalizer=False)[source]¶ Bases:
GPy.core.gp.GP
Gaussian Process Latent Variable Model
Parameters: - Y (np.ndarray) – observed data
- input_dim (int) – latent dimensionality
- init ('PCA'|'random') – initialisation method for the latent space
- normalizer (bool) – normalize the outputs Y. If normalizer is True, we will normalize using Standardize. If normalizer is False (the default), no normalization will be done.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot_inducing
(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)¶ Plot a scatter plot of the inducing inputs.
Parameters: - which_indices ([int]) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – marker to use [default is custom arrow like]
- kwargs – the kwargs for the scatter plots
- projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
-
plot_latent
(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- scatter_kwargs – the kwargs for the scatter plots
-
plot_scatter
(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)¶ Plot a scatter plot of the latent space.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – markers to use - cycle if more labels then markers are given
- kwargs – the kwargs for the scatter plots
-
plot_steepest_gradient_map
(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- annotation_kwargs – the kwargs for the annotation plot
- scatter_kwargs – the kwargs for the scatter plots
GPy.models.gradient_checker module¶
-
class
GradientChecker
(f, df, x0, names=None, *args, **kwargs)[source]¶ Bases:
GPy.core.model.Model
Parameters: - f – Function to check gradient for
- df – Gradient of function to check
- x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!!
- names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected.
- args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)
Examples
Initialisation:
from GPy.models import GradientChecker N, M, Q = 10, 5, 3
Sinusoid:
X = numpy.random.rand(N, Q) grad = GradientChecker(numpy.sin,numpy.cos,X,'x') grad.checkgrad(verbose=1)
Using GPy:
X, Z = numpy.random.randn(N,Q), numpy.random.randn(M,Q) kern = GPy.kern.linear(Q, ARD=True) + GPy.kern.rbf(Q, ARD=True) grad = GradientChecker(kern.K, lambda x: 2*kern.dK_dX(numpy.ones((1,1)), x), x0 = X.copy(), names='X') grad.checkgrad(verbose=1) grad.randomize() grad.checkgrad(verbose=1)
-
class
HessianChecker
(f, df, ddf, x0, names=None, *args, **kwargs)[source]¶ Bases:
GPy.models.gradient_checker.GradientChecker
Parameters: - f – Function (only used for numerical hessian gradient)
- df – Gradient of function to check
- ddf – Analytical gradient function
- x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!!
- names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected.
- args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)
-
checkgrad
(target_param=None, verbose=False, step=1e-06, tolerance=0.001, block_indices=None, plot=False)[source]¶ Overwrite checkgrad method to check whole block instead of looping through
Shows diagnostics using matshow instead
Parameters: - verbose (bool) – If True, print a “full” checking of each parameter
- step (float (default 1e-6)) – The size of the step around which to linearise the objective
- tolerance (float (default 1e-3)) – the tolerance allowed (see note)
- Note:-
- The gradient is considered correct if the ratio of the analytical and numerical gradients is within <tolerance> of unity.
-
class
SkewChecker
(df, ddf, dddf, x0, names=None, *args, **kwargs)[source]¶ Bases:
GPy.models.gradient_checker.HessianChecker
Parameters: - df – gradient of function
- ddf – Gradient of function to check (hessian)
- dddf – Analytical gradient function (third derivative)
- x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!!
- names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected.
- args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)
GPy.models.ibp_lfm module¶
-
class
IBPLFM
(X, Y, input_dim=2, output_dim=1, rank=1, Gamma=None, num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='IBP for LFM', alpha=2.0, beta=2.0, connM=None, tau=None, mpi_comm=None, normalizer=False, variational_prior=None, **kwargs)[source]¶ Bases:
GPy.core.sparse_gp_mpi.SparseGP_MPI
Indian Buffet Process for Latent Force Models
Parameters: - Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
- X (np.ndarray) – input data (np.ndarray) [X:values, X:index], index refers to the number of the output
- input_dim (int) – latent dimensionality
: param rank: number of latent functions
-
get_Zp_gradients
(Zp)[source]¶ Get the gradients of the posterior distribution of Zp in its specific form.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
class
IBPPosterior
(binary_prob, tau=None, name='Sensitivity space', *a, **kw)[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
The IBP distribution for variational approximations.
binary_prob : the probability of including a latent function over an output.
-
class
IBPPrior
(rank, alpha=2.0, name='IBPPrior', **kw)[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPrior
-
class
VarDTC_minibatch_IBPLFM
(batchsize=None, limit=3, mpi_comm=None)[source]¶ Bases:
GPy.inference.latent_function_inference.var_dtc_parallel.VarDTC_minibatch
Modifications of VarDTC_minibatch for IBP LFM
GPy.models.input_warped_gp module¶
-
class
InputWarpedGP
(X, Y, kernel=None, normalizer=False, warping_function=None, warping_indices=None, Xmin=None, Xmax=None, epsilon=None)[source]¶ Bases:
GPy.core.gp.GP
Input Warped GP
This defines a GP model that applies a warping function to the Input. By default, it uses Kumar Warping (CDF of Kumaraswamy distribution)
X : array_like, shape = (n_samples, n_features) for input data
Y : array_like, shape = (n_samples, 1) for output data
- kernel : object, optional
- An instance of kernel function defined in GPy.kern Default to Matern 32
- warping_function : object, optional
- An instance of warping function defined in GPy.util.input_warping_functions Default to KumarWarping
- warping_indices : list of int, optional
- An list of indices of which features in X should be warped. It is used in the Kumar warping function
- normalizer : bool, optional
- A bool variable indicates whether to normalize the output
- Xmin : list of float, optional
- The min values for every feature in X It is used in the Kumar warping function
- Xmax : list of float, optional
- The max values for every feature in X It is used in the Kumar warping function
- epsilon : float, optional
- We normalize X to [0+e, 1-e]. If not given, using the default value defined in KumarWarping function
- X_untransformed : array_like, shape = (n_samples, n_features)
- A copy of original input X
- X_warped : array_like, shape = (n_samples, n_features)
- Input data after warping
- warping_function : object, optional
- An instance of warping function defined in GPy.util.input_warping_functions Default to KumarWarping
Kumar warping uses the CDF of Kumaraswamy distribution. More on the Kumaraswamy distribution can be found at the wiki page: https://en.wikipedia.org/wiki/Kumaraswamy_distribution
Snoek, J.; Swersky, K.; Zemel, R. S. & Adams, R. P. Input Warping for Bayesian Optimization of Non-stationary Functions preprint arXiv:1402.0929, 2014
-
log_likelihood
()[source]¶ Compute the marginal log likelihood
For input warping, just use the normal GP log likelihood
-
parameters_changed
()[source]¶ Update the gradients of parameters for warping function
This method is called when having new values of parameters for warping function, kernels and other parameters in a normal GP
GPy.models.mrd module¶
-
class
MRD
(Ylist, input_dim, X=None, X_variance=None, initx='PCA', initz='permute', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihoods=None, name='mrd', Ynames=None, normalizer=False, stochastic=False, batchsize=10)[source]¶ Bases:
GPy.models.bayesian_gplvm_minibatch.BayesianGPLVMMiniBatch
!WARNING: This is bleeding edge code and still in development. Functionality may change fundamentally during development!
Apply MRD to all given datasets Y in Ylist.
Y_i in [n x p_i]
If Ylist is a dictionary, the keys of the dictionary are the names, and the values are the different datasets to compare.
The samples n in the datasets need to match up, whereas the dimensionality p_d can differ.
Parameters: - Ylist ([array-like]) – List of datasets to apply MRD on
- input_dim (int) – latent dimensionality
- X (array-like) – mean of starting latent space q in [n x q]
- X_variance (array-like) – variance of starting latent space q in [n x q]
- initx (['concat'|'single'|'random']) –
initialisation method for the latent space :
- ’concat’ - PCA on concatenation of all datasets
- ’single’ - Concatenation of PCA on datasets, respectively
- ’random’ - Random draw from a Normal(0,1)
- initz ('permute'|'random') – initialisation method for inducing inputs
- num_inducing – number of inducing inputs to use
- Z – initial inducing inputs
- kernel ([GPy.kernels.kernels] | GPy.kernels.kernels | None (default)) – list of kernels or kernel to copy for each output
- :param :class:`~GPy.inference.latent_function_inference inference_method:
- InferenceMethodList of inferences, or one inference method for all
:param
likelihoods
likelihoods: the likelihoods to use :param str name: the name of this model :param [str] Ynames: the names for the datasets given, must be of equal length as Ylist or None :param bool|Norm normalizer: How to normalize the data? :param bool stochastic: Should this model be using stochastic gradient descent over the dimensions? :param bool|[bool] batchsize: either one batchsize for all, or one batchsize per dataset.-
factorize_space
(threshold=0.005, printOut=False, views=None)[source]¶ Given a trained MRD model, this function looks at the optimized ARD weights (lengthscales) and decides which part of the latent space is shared across views or private, according to a threshold. The threshold is applied after all weights are normalized so that the maximum value is 1.
-
log_likelihood
()[source]¶ The log marginal likelihood of the model, \(p(\mathbf{y})\), this is the objective function of the model being optimised
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot_latent
(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', predict_kwargs={}, scatter_kwargs=None, **imshow_kwargs)[source]¶ see plotting.matplot_dep.dim_reduction_plots.plot_latent if predict_kwargs is None, will plot latent spaces for 0th dataset (and kernel), otherwise give predict_kwargs=dict(Yindex=’index’) for plotting only the latent space of dataset with ‘index’.
GPy.models.multioutput_gp module¶
-
class
MultioutputGP
(X_list, Y_list, kernel_list, likelihood_list, name='multioutputgp', kernel_cross_covariances={}, inference_method=None)[source]¶ Bases:
GPy.core.gp.GP
Gaussian process model for using observations from multiple likelihoods and different kernels :param X_list: input observations in a list for each likelihood :param Y: output observations in a list for each likelihood :param kernel_list: kernels in a list for each likelihood :param likelihood_list: likelihoods in a list :param kernel_cross_covariances: Cross covariances between different likelihoods. See class MultioutputKern for more :param inference_method: The
LatentFunctionInference
inference method to use for this GP-
log_predictive_density
(x_test, y_test, Y_metadata=None)[source]¶ Calculation of the log predictive density
Parameters: - x_test ((Nx1) array) – test locations (x_{*})
- y_test ((Nx1) array) – test observations (y_{*})
- Y_metadata – metadata associated with the test points
-
predict
(Xnew, full_cov=False, Y_metadata=None, kern=None, likelihood=None, include_likelihood=True)[source]¶ Predict the function(s) at the new point(s) Xnew. This includes the likelihood variance added to the predicted underlying function (usually referred to as f).
In order to predict without adding in the likelihood give include_likelihood=False, or refer to self.predict_noiseless().
Parameters: - Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
- full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
- Y_metadata – metadata about the predicting point to pass to the likelihood
- kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
- include_likelihood (bool) – Whether or not to add likelihood noise to the predicted underlying latent function f.
Returns: (mean, var): mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False,
Nnew x Nnew otherwise
If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.
Note: If you want the predictive quantiles (e.g. 95% confidence interval) use
predict_quantiles
.
-
predict_noiseless
(Xnew, full_cov=False, Y_metadata=None, kern=None)[source]¶ Convenience function to predict the underlying function of the GP (often referred to as f) without adding the likelihood variance on the prediction function.
This is most likely what you want to use for your predictions.
Parameters: - Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
- full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
- Y_metadata – metadata about the predicting point to pass to the likelihood
- kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
Returns: - (mean, var):
mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False, Nnew x Nnew otherwise
If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.
Note: If you want the predictive quantiles (e.g. 95% confidence interval) use
predict_quantiles
.
-
predict_quantiles
(X, quantiles=(2.5, 97.5), Y_metadata=None, kern=None, likelihood=None)[source]¶ Get the predictive quantiles around the prediction at X
Parameters: - X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
- quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
- kern – optional kernel to use for prediction
Returns: list of quantiles for each X and predictive quantiles for interval combination
Return type: [np.ndarray (Xnew x self.output_dim), np.ndarray (Xnew x self.output_dim)]
-
predictive_gradients
(Xnew, kern=None)[source]¶ Compute the derivatives of the predicted latent function with respect to X* Given a set of points at which to predict X* (size [N*,Q]), compute the derivatives of the mean and variance. Resulting arrays are sized:
dmu_dX* – [N*, Q ,D], where D is the number of output in this GP (usually one).- Note that this is not the same as computing the mean and variance of the derivative of the function!
- dv_dX* – [N*, Q], (since all outputs have the same variance)
Parameters: X (np.ndarray (Xnew x self.input_dim)) – The points at which to get the predictive gradients Returns: dmu_dX, dv_dX Return type: [np.ndarray (N*, Q ,D), np.ndarray (N*,Q) ]
-
GPy.models.one_vs_all_classification module¶
-
class
OneVsAllClassification
(X, Y, kernel=None, Y_metadata=None, messages=True)[source]¶ Bases:
object
Gaussian Process classification: One vs all
This is a thin wrapper around the models.GPClassification class, with a set of sensible defaults
Parameters: - X – input observations
- Y – observed values, can be None if likelihood is not None
- kernel – a GPy kernel, defaults to rbf
Note
Multiple independent outputs are not allowed
GPy.models.one_vs_all_sparse_classification module¶
-
class
OneVsAllSparseClassification
(X, Y, kernel=None, Y_metadata=None, messages=True, num_inducing=10)[source]¶ Bases:
object
Gaussian Process classification: One vs all
This is a thin wrapper around the models.GPClassification class, with a set of sensible defaults
Parameters: - X – input observations
- Y – observed values, can be None if likelihood is not None
- kernel – a GPy kernel, defaults to rbf
Note
Multiple independent outputs are not allowed
GPy.models.sparse_gp_classification module¶
-
class
SparseGPClassification
(X, Y=None, likelihood=None, kernel=None, Z=None, num_inducing=10, Y_metadata=None, mean_function=None, inference_method=None, normalizer=False)[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
Sparse Gaussian Process model for classification
This is a thin wrapper around the sparse_GP class, with a set of sensible defaults
Parameters: - X – input observations
- Y – observed values
- likelihood – a GPy likelihood, defaults to Bernoulli
- kernel – a GPy kernel, defaults to rbf+white
- inference_method (
GPy.inference.latent_function_inference.LatentFunctionInference
) – Latent function inference to use, defaults to EPDTC - normalize_X (False|True) – whether to normalize the input data before computing (predictions will be in original scales)
- normalize_Y (False|True) – whether to normalize the input data before computing (predictions will be in original scales)
Return type: model object
-
static
from_dict
(input_dict, data=None)[source]¶ Instantiate an SparseGPClassification object using the information in input_dict (built by the to_dict method).
Parameters: data (tuple( np.ndarray
,np.ndarray
)) – It is used to provide X and Y for the case when the model was saved using save_data=False in to_dict method.
-
class
SparseGPClassificationUncertainInput
(X, X_variance, Y, kernel=None, Z=None, num_inducing=10, Y_metadata=None, normalizer=None)[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
Sparse Gaussian Process model for classification with uncertain inputs.
This is a thin wrapper around the sparse_GP class, with a set of sensible defaults
Parameters: - X (np.ndarray (num_data x input_dim)) – input observations
- X_variance (np.ndarray (num_data x input_dim)) – The uncertainty in the measurements of X (Gaussian variance, optional)
- Y – observed values
- kernel – a GPy kernel, defaults to rbf+white
- Z (np.ndarray (num_inducing x input_dim) | None) – inducing inputs (optional, see note)
- num_inducing (int) – number of inducing points (ignored if Z is passed, see note)
Return type: model object
Note
If no Z array is passed, num_inducing (default 10) points are selected from the data. Other wise num_inducing is ignored
Note
Multiple independent outputs are allowed using columns of Y
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.sparse_gp_coregionalized_regression module¶
-
class
SparseGPCoregionalizedRegression
(X_list, Y_list, Z_list=[], kernel=None, likelihoods_list=None, num_inducing=10, X_variance=None, name='SGPCR', W_rank=1, kernel_name='coreg')[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
Sparse Gaussian Process model for heteroscedastic multioutput regression
This is a thin wrapper around the SparseGP class, with a set of sensible defaults
Parameters: - X_list (list of numpy arrays) – list of input observations corresponding to each output
- Y_list (list of numpy arrays) – list of observed values related to the different noise models
- Z_list (empty list | list of numpy arrays) – list of inducing inputs (optional)
- kernel (None | GPy.kernel defaults) – a GPy kernel ** Coregionalized, defaults to RBF ** Coregionalized
- num_inducing (integer | list of integers) – number of inducing inputs, defaults to 10 per output (ignored if Z_list is not empty)
- name (string) – model name
- W_rank (integer) – number tuples of the corregionalization parameters ‘W’ (see coregionalize kernel documentation)
- kernel_name (string) – name of the kernel
Likelihoods_list: a list of likelihoods, defaults to list of Gaussian likelihoods
GPy.models.sparse_gp_minibatch module¶
-
class
SparseGPMiniBatch
(X, Y, Z, kernel, likelihood, inference_method=None, name='sparse gp', Y_metadata=None, normalizer=False, missing_data=False, stochastic=False, batchsize=1)[source]¶ Bases:
GPy.core.sparse_gp.SparseGP
A general purpose Sparse GP model, allowing missing data and stochastics across dimensions.
This model allows (approximate) inference using variational DTC or FITC (Gaussian likelihoods) as well as non-conjugate sparse methods based on these.
Parameters: - X (np.ndarray (num_data x input_dim)) – inputs
- likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
- kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
- X_variance (np.ndarray (num_data x input_dim) | None) – The uncertainty in the measurements of X (Gaussian variance)
- Z (np.ndarray (num_inducing x input_dim)) – inducing inputs
- num_inducing (int) – Number of inducing points (optional, default 10. Ignored if Z is not None)
-
optimize
(optimizer=None, start=None, **kwargs)[source]¶ Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors. kwargs are passed to the optimizer. They can be:
Parameters: - max_iters (int) – maximum number of function evaluations
- messages (bool) – whether to display during optimisation
- optimizer (string) – which optimizer to use (defaults to self.preferred optimizer), a range of optimisers can be found in :module:`~GPy.inference.optimization`, they include ‘scg’, ‘lbfgs’, ‘tnc’.
- ipython_notebook (bool) – whether to use ipython notebook widgets or not.
- clear_after_finish (bool) – if in ipython notebook, we can clear the widgets after optimization.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.sparse_gp_regression module¶
-
class
SparseGPRegression
(X, Y, kernel=None, Z=None, num_inducing=10, X_variance=None, mean_function=None, normalizer=None, mpi_comm=None, name='sparse_gp')[source]¶ Bases:
GPy.core.sparse_gp_mpi.SparseGP_MPI
Gaussian Process model for regression
This is a thin wrapper around the SparseGP class, with a set of sensible defalts
Parameters: - X – input observations
- X_variance – input uncertainties, one per input X
- Y – observed values
- kernel – a GPy kernel, defaults to rbf+white
- Z (np.ndarray (num_inducing x input_dim) | None) – inducing inputs (optional, see note)
- num_inducing (int) – number of inducing points (ignored if Z is passed, see note)
Return type: model object
Note
If no Z array is passed, num_inducing (default 10) points are selected from the data. Other wise num_inducing is ignored
Note
Multiple independent outputs are allowed using columns of Y
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.sparse_gp_regression_md module¶
-
class
SparseGPRegressionMD
(X, Y, indexD, kernel=None, Z=None, num_inducing=10, normalizer=None, mpi_comm=None, individual_Y_noise=False, name='sparse_gp')[source]¶ Bases:
GPy.core.sparse_gp_mpi.SparseGP_MPI
Sparse Gaussian Process Regression with Missing Data
This model targets at the use case, in which there are multiple output dimensions (different dimensions are assumed to be independent following the same GP prior) and each output dimension is observed at a different set of inputs. The model takes a different data format: the inputs and outputs observations of all the output dimensions are stacked together correspondingly into two matrices. An extra array is used to indicate the index of output dimension for each data point. The output dimensions are indexed using integers from 0 to D-1 assuming there are D output dimensions.
Parameters: - X (numpy.ndarray) – input observations.
- Y (numpy.ndarray) – output observations, each column corresponding to an output dimension.
- indexD (numpy.ndarray) – the array containing the index of output dimension for each data point
- kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF **
- Z (numpy.ndarray or None) – inducing inputs
- num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space.
- individual_Y_noise (boolean) – whether individual output dimensions have their own noise variance or not, boolean
- name (str) – the name of the model
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
GPy.models.sparse_gplvm module¶
-
class
SparseGPLVM
(Y, input_dim, X=None, kernel=None, init='PCA', num_inducing=10)[source]¶ Bases:
GPy.models.sparse_gp_regression.SparseGPRegression
Sparse Gaussian Process Latent Variable Model
Parameters: - Y (np.ndarray) – observed data
- input_dim (int) – latent dimensionality
- init ('PCA'|'random') – initialisation method for the latent space
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot_latent
(labels=None, which_indices=None, resolution=50, ax=None, marker='o', s=40, fignum=None, plot_inducing=True, legend=True, plot_limits=None, aspect='auto', updates=False, predict_kwargs={}, imshow_kwargs={})[source]¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
GPy.models.ss_gplvm module¶
-
class
IBPPosterior
(means, variances, binary_prob, tau=None, sharedX=False, name='latent space')[source]¶ Bases:
GPy.core.parameterization.variational.SpikeAndSlabPosterior
The SpikeAndSlab distribution for variational approximations.
binary_prob : the probability of the distribution on the slab part.
-
class
IBPPrior
(input_dim, alpha=2.0, name='IBPPrior', **kw)[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPrior
-
class
SLVMPosterior
(means, variances, binary_prob, tau=None, name='latent space')[source]¶ Bases:
GPy.core.parameterization.variational.SpikeAndSlabPosterior
The SpikeAndSlab distribution for variational approximations.
binary_prob : the probability of the distribution on the slab part.
-
class
SLVMPrior
(input_dim, alpha=1.0, beta=1.0, Z=None, name='SLVMPrior', **kw)[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPrior
-
class
SSGPLVM
(Y, input_dim, X=None, X_variance=None, Gamma=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='Spike_and_Slab GPLVM', group_spike=False, IBP=False, SLVM=False, alpha=2.0, beta=2.0, connM=None, tau=None, mpi_comm=None, pi=None, learnPi=False, normalizer=False, sharedX=False, variational_prior=None, **kwargs)[source]¶ Bases:
GPy.core.sparse_gp_mpi.SparseGP_MPI
Spike-and-Slab Gaussian Process Latent Variable Model
Parameters: - Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
- input_dim (int) – latent dimensionality
- init ('PCA'|'random') – initialisation method for the latent space
-
get_X_gradients
(X)[source]¶ Get the gradients of the posterior distribution of X in its specific form.
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot_inducing
(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)¶ Plot a scatter plot of the inducing inputs.
Parameters: - which_indices ([int]) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – marker to use [default is custom arrow like]
- kwargs – the kwargs for the scatter plots
- projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
-
plot_latent
(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- scatter_kwargs – the kwargs for the scatter plots
-
plot_scatter
(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)¶ Plot a scatter plot of the latent space.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – markers to use - cycle if more labels then markers are given
- kwargs – the kwargs for the scatter plots
-
plot_steepest_gradient_map
(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- annotation_kwargs – the kwargs for the annotation plot
- scatter_kwargs – the kwargs for the scatter plots
GPy.models.ss_mrd module¶
The Maniforld Relevance Determination model with the spike-and-slab prior
-
class
IBPPrior_SSMRD
(nModels, input_dim, alpha=2.0, tau=None, name='IBPPrior', **kw)[source]¶ Bases:
GPy.core.parameterization.variational.VariationalPrior
-
class
SSMRD
(Ylist, input_dim, X=None, X_variance=None, Gammas=None, initx='PCA_concat', initz='permute', num_inducing=10, Zs=None, kernels=None, inference_methods=None, likelihoods=None, group_spike=True, pi=0.5, name='ss_mrd', Ynames=None, mpi_comm=None, IBP=False, alpha=2.0, taus=None)[source]¶ Bases:
GPy.core.model.Model
-
optimize
(optimizer=None, start=None, **kwargs)[source]¶ Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors.
kwargs are passed to the optimizer. They can be:
Parameters: - max_iters (int) – maximum number of function evaluations
- optimizer (string) – which optimizer to use (defaults to self.preferred optimizer)
Messages: True: Display messages during optimisation, “ipython_notebook”:
- Valid optimizers are:
- ‘scg’: scaled conjugate gradient method, recommended for stability.
- See also GPy.inference.optimization.scg
- ‘fmin_tnc’: truncated Newton method (see scipy.optimize.fmin_tnc)
- ‘simplex’: the Nelder-Mead simplex method (see scipy.optimize.fmin),
- ‘lbfgsb’: the l-bfgs-b method (see scipy.optimize.fmin_l_bfgs_b),
- ‘lbfgs’: the bfgs method (see scipy.optimize.fmin_bfgs),
- ‘sgd’: stochastic gradient decsent (see scipy.optimize.sgd). For experts only!
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
optimizer_array
¶ Array for the optimizer to work on. This array always lives in the space for the optimizer. Thus, it is untransformed, going from Transformations.
Setting this array, will make sure the transformed parameters for this model will be set accordingly. It has to be set with an array, retrieved from this method, as e.g. fixing will resize the array.
The optimizer should only interfere with this array, such that transformations are secured.
-
-
class
SpikeAndSlabPrior_SSMRD
(nModels, pi=0.5, learnPi=False, group_spike=True, variance=1.0, name='SSMRDPrior', **kw)[source]¶ Bases:
GPy.core.parameterization.variational.SpikeAndSlabPrior
GPy.models.state_space module¶
GPy.models.state_space_cython module¶
GPy.models.state_space_main module¶
Main functionality for state-space inference.
-
class
AddMethodToClass
(func=None, tp='staticmethod')[source]¶ Bases:
object
func: function to add tp: string Type of the method: normal, staticmethod, classmethod
-
class
ContDescrStateSpace
[source]¶ Bases:
GPy.models.state_space_main.DescreteStateSpace
Class for continuous-discrete Kalman filter. State equation is continuous while measurement equation is discrete.
d x(t)/ dt = F x(t) + L q; where q~ N(0, Qc) y_{t_k} = H_{k} x_{t_k} + r_{k}; r_{k-1} ~ N(0, R_{k})-
class
AQcompute_batch_Python
(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]¶ Bases:
GPy.models.state_space_main.Q_handling_Python
Class for calculating matrices A, Q, dA, dQ of the discrete Kalman Filter from the matrices F, L, Qc, P_ing, dF, dQc, dP_inf of the continuos state equation. dt - time steps.
It has the same interface as AQcompute_once.
It computes matrices for all time steps. This object is used when there are not so many (controlled by internal variable) different time steps and storing all the matrices do not take too much memory.
Since all the matrices are computed all together, this object can be used in smoother without repeating the computations.
Constructor. All necessary parameters are passed here and stored in the opject.
- F, L, Qc, P_inf : matrices
- Parameters of corresponding continuous state model
- dt: array
- All time steps
- compute_derivatives: bool
- Whether to calculate derivatives
- dP_inf, dF, dQc: 3D array
- Derivatives if they are required
Nothing
-
Ak
(k, m, P)[source]¶ function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.
k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
-
Q_inverse
(k, p_largest_cond_num, p_regularization_type)[source]¶ Function inverts Q matrix and regularizes the inverse. Regularization is useful when original matrix is badly conditioned. Function is currently used only in SparseGP code.
k: int Iteration number.
p_largest_cond_num: float Largest condition value for the inverted matrix. If cond. number is smaller than that no regularization happen.
regularization_type: 1 or 2 Regularization type.
regularization_type: int (1 or 2)
type 1: 1/(S[k] + regularizer) regularizer is computed type 2: S[k]/(S^2[k] + regularizer) regularizer is computed
-
Qk
(k)[source]¶ function (k). Returns noise matrix of dynamic model on iteration k. k (iteration number). starts at 0
-
dAk
(k)[source]¶ function (k). Returns the derivative of A on iteration k. k (iteration number). starts at 0
-
dQk
(k)[source]¶ function (k). Returns the derivative of Q on iteration k. k (iteration number). starts at 0
-
class
AQcompute_once
(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]¶ Bases:
GPy.models.state_space_main.Q_handling_Python
Class for calculating matrices A, Q, dA, dQ of the discrete Kalman Filter from the matrices F, L, Qc, P_ing, dF, dQc, dP_inf of the continuos state equation. dt - time steps.
It has the same interface as AQcompute_batch.
It computes matrices for only one time step. This object is used when there are many different time steps and storing matrices for each of them would take too much memory.
Constructor. All necessary parameters are passed here and stored in the opject.
- F, L, Qc, P_inf : matrices
- Parameters of corresponding continuous state model
- dt: array
- All time steps
- compute_derivatives: bool
- Whether to calculate derivatives
- dP_inf, dF, dQc: 3D array
- Derivatives if they are required
Nothing
-
Ak
(k, m, P)[source]¶ function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.
k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
-
Q_inverse
(k, p_largest_cond_num, p_regularization_type)[source]¶ Function inverts Q matrix and regularizes the inverse. Regularization is useful when original matrix is badly conditioned. Function is currently used only in SparseGP code.
k: int Iteration number.
p_largest_cond_num: float Largest condition value for the inverted matrix. If cond. number is smaller than that no regularization happen.
regularization_type: 1 or 2 Regularization type.
regularization_type: int (1 or 2)
type 1: 1/(S[k] + regularizer) regularizer is computed type 2: S[k]/(S^2[k] + regularizer) regularizer is computed
-
Q_srk
(k)[source]¶ Check square root, maybe rewriting for Spectral decomposition is needed. Square root of the noise matrix Q
-
Qk
(k)[source]¶ function (k). Returns noise matrix of dynamic model on iteration k. k (iteration number). starts at 0
-
dAk
(k)[source]¶ function (k). Returns the derivative of A on iteration k. k (iteration number). starts at 0
-
dQk
(k)[source]¶ function (k). Returns the derivative of Q on iteration k. k (iteration number). starts at 0
-
classmethod
cont_discr_kalman_filter
(F, L, Qc, p_H, p_R, P_inf, X, Y, index=None, m_init=None, P_init=None, p_kalman_filter_type='regular', calc_log_likelihood=False, calc_grad_log_likelihood=False, grad_params_no=0, grad_calc_params=None)[source]¶ This function implements the continuous-discrete Kalman Filter algorithm These notations for the State-Space model are assumed:
d/dt x(t) = F * x(t) + L * w(t); w(t) ~ N(0, Qc) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})Returns estimated filter distributions x_{k} ~ N(m_{k}, P(k))
1) The function generaly do not modify the passed parameters. If it happens then it is an error. There are several exeprions: scalars can be modified into a matrix, in some rare cases shapes of the derivatives matrices may be changed, it is ignored for now.
2) Copies of F,L,Qc are created in memory because they may be used later in smoother. References to copies are kept in “AQcomp” object return parameter.
3) Function support “multiple time series mode” which means that exactly the same State-Space model is used to filter several sets of measurements. In this case third dimension of Y should include these state-space measurements Log_likelihood and Grad_log_likelihood have the corresponding dimensions then.
4) Calculation of Grad_log_likelihood is not supported if matrices H, or R changes overf time (with index k). (later may be changed)
5) Measurement may include missing values. In this case update step is not done for this measurement. (later may be changed)
- F: (state_dim, state_dim) matrix
- F in the model.
- L: (state_dim, noise_dim) matrix
- L in the model.
- Qc: (noise_dim, noise_dim) matrix
- Q_c in the model.
- p_H: scalar, matrix (measurement_dim, state_dim) , 3D array
- H_{k} in the model. If matrix then H_{k} = H - constant. If it is 3D array then H_{k} = p_Q[:,:, index[2,k]]
- p_R: scalar, square symmetric matrix, 3D array
- R_{k} in the model. If matrix then R_{k} = R - constant. If it is 3D array then R_{k} = p_R[:,:, index[3,k]]
- P_inf: (state_dim, state_dim) matrix
- State varince matrix on infinity.
- X: 1D array
- Time points of measurements. Needed for converting continuos problem to the discrete one.
- Y: matrix or vector or 3D array
- Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. If 3D array then third dimension correspond to “multiple time series mode”.
- index: vector
- Which indices (on 3-rd dimension) from arrays p_H, p_R to use on every time step. If this parameter is None then it is assumed that p_H, p_R do not change over time and indices are not needed. index[0,:] - correspond to H, index[1,:] - correspond to R If index.shape[0] == 1, it is assumed that indides for all matrices are the same.
- m_init: vector or matrix
- Initial distribution mean. If None it is assumed to be zero. For “multiple time series mode” it is matrix, second dimension of which correspond to different time series. In regular case (“one time series mode”) it is a vector.
- P_init: square symmetric matrix or scalar
- Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead. “multiple time series mode” does not affect it, since it does not affect anything related to state variaces.
- p_kalman_filter_type: string, one of (‘regular’, ‘svd’)
- Which Kalman Filter is used. Regular or SVD. SVD is more numerically stable, in particular, Covariace matrices are guarantied to be positive semi-definite. However, ‘svd’ works slower, especially for small data due to SVD call overhead.
- calc_log_likelihood: boolean
- Whether to calculate marginal likelihood of the state-space model.
- calc_grad_log_likelihood: boolean
- Whether to calculate gradient of the marginal likelihood of the state-space model. If true then “grad_calc_params” parameter must provide the extra parameters for gradient calculation.
- grad_params_no: int
- If previous parameter is true, then this parameters gives the total number of parameters in the gradient.
- grad_calc_params: dictionary
- Dictionary with derivatives of model matrices with respect to parameters “dF”, “dL”, “dQc”, “dH”, “dR”, “dm_init”, “dP_init”. They can be None, in this case zero matrices (no dependence on parameters) is assumed. If there is only one parameter then third dimension is automatically added.
- M: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
- Filter estimates of the state means. In the extra step the initial value is included. In the “multiple time series mode” third dimension correspond to different timeseries.
- P: (no_steps+1, state_dim, state_dim) 3D array
- Filter estimates of the state covariances. In the extra step the initial value is included.
log_likelihood: double or (1, time_series_no) 3D array.
If the parameter calc_log_likelihood was set to true, return logarithm of marginal likelihood of the state-space model. If the parameter was false, return None. In the “multiple time series mode” it is a vector providing log_likelihood for each time series.- grad_log_likelihood: column vector or (grad_params_no, time_series_no) matrix
- If calc_grad_log_likelihood is true, return gradient of log likelihood with respect to parameters. It returns it column wise, so in “multiple time series mode” gradients for each time series is in the corresponding column.
- AQcomp: object
- Contains some pre-computed values for converting continuos model into discrete one. It can be used later in the smoothing pahse.
-
classmethod
cont_discr_rts_smoother
(state_dim, filter_means, filter_covars, p_dynamic_callables=None, X=None, F=None, L=None, Qc=None)[source]¶ Continuos-discrete Rauch–Tung–Striebel(RTS) smoother.
This function implements Rauch–Tung–Striebel(RTS) smoother algorithm based on the results of _cont_discr_kalman_filter_raw.
- Model:
- d/dt x(t) = F * x(t) + L * w(t); w(t) ~ N(0, Qc) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})
Returns estimated smoother distributions x_{k} ~ N(m_{k}, P(k))
- filter_means: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
- Results of the Kalman Filter means estimation.
- filter_covars: (no_steps+1, state_dim, state_dim) 3D array
- Results of the Kalman Filter covariance estimation.
- Dynamic_callables: object or None
- Object form the filter phase which provides functions for computing A, Q, dA, dQ fro discrete model from the continuos model.
- X, F, L, Qc: matrices
- If AQcomp is None, these matrices are used to create this object from scratch.
- M: (no_steps+1,state_dim) matrix
- Smoothed estimates of the state means
- P: (no_steps+1,state_dim, state_dim) 3D array
- Smoothed estimates of the state covariances
-
static
lti_sde_to_descrete
(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]¶ Linear Time-Invariant Stochastic Differential Equation (LTI SDE):
dx(t) = F x(t) dt + L d eta ,where
x(t): (vector) stochastic process eta: (vector) Brownian motion process F, L: (time invariant) matrices of corresponding dimensions Qc: covariance of noise.This function rewrites it into the corresponding state-space form:
x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1})TODO: this function can be redone to “preprocess dataset”, when close time points are handeled properly (with rounding parameter) and values are averaged accordingly.
F,L: LTI SDE matrices of corresponding dimensions
- Qc: matrix (n,n)
- Covarince between different dimensions of noise eta. n is the dimensionality of the noise.
- dt: double or iterable
- Time difference used on this iteration. If dt is iterable, then A and Q_noise are computed for every unique dt
- compute_derivatives: boolean
- Whether derivatives of A and Q are required.
- grad_params_no: int
- Number of gradient parameters
P_inf: (state_dim. state_dim) matrix
dP_inf
- dF: 3D array
- Derivatives of F
- dQc: 3D array
- Derivatives of Qc
- dR: 3D array
- Derivatives of R
- A: matrix
- A_{k}. Because we have LTI SDE only dt can affect on matrix difference for different k.
- Q_noise: matrix
- Covariance matrix of (vector) q_{k-1}. Only dt can affect the matrix difference for different k.
- reconstruct_index: array
- If dt was iterable return three dimensinal arrays A and Q_noise. Third dimension of these arrays correspond to unique dt’s. This reconstruct_index contain indices of the original dt’s in the uninue dt sequence. A[:,:, reconstruct_index[5]] is matrix A of 6-th(indices start from zero) dt in the original sequence.
- dA: 3D array
- Derivatives of A
- dQ: 3D array
- Derivatives of Q
-
class
-
class
DescreteStateSpace
[source]¶ Bases:
object
This class implents state-space inference for linear and non-linear state-space models. Linear models are: x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})
Nonlinear: x_{k} = f_a(k, x_{k-1}, A_{k}) + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = f_h(k, x_{k}, H_{k}) + r_{k}; r_{k-1} ~ N(0, R_{k}) Here f_a and f_h are some functions of k (iteration number), x_{k-1} or x_{k} (state value on certain iteration), A_{k} and H_{k} - Jacobian matrices of f_a and f_h respectively. In the linear case they are exactly A_{k} and H_{k}.
Currently two nonlinear Gaussian filter algorithms are implemented: Extended Kalman Filter (EKF), Statistically linearized Filter (SLF), which implementations are very similar.
-
classmethod
extended_kalman_filter
(p_state_dim, p_a, p_f_A, p_f_Q, p_h, p_f_H, p_f_R, Y, m_init=None, P_init=None, calc_log_likelihood=False)[source]¶ Extended Kalman Filter
p_state_dim: integer
- p_a: if None - the function from the linear model is assumed. No non-
linearity in the dynamic is assumed.
function (k, x_{k-1}, A_{k}). Dynamic function. k: (iteration number), x_{k-1}: (previous state) x_{k}: Jacobian matrices of f_a. In the linear case it is exactly A_{k}.
- p_f_A: matrix - in this case function which returns this matrix is assumed.
Look at this parameter description in kalman_filter function.
function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.
k: (iteration number), m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
- p_f_Q: matrix. In this case function which returns this matrix is asumed.
Look at this parameter description in kalman_filter function.
function (k). Returns noise matrix of dynamic model on iteration k. k: (iteration number).
- p_h: if None - the function from the linear measurement model is assumed.
No nonlinearity in the measurement is assumed.
function (k, x_{k}, H_{k}). Measurement function. k: (iteration number), x_{k}: (current state) H_{k}: Jacobian matrices of f_h. In the linear case it is exactly H_{k}.
- p_f_H: matrix - in this case function which returns this matrix is assumed.
- function (k, m, P) return Jacobian of dynamic function, it is passed into p_h. k: (iteration number), m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
- p_f_R: matrix. In this case function which returns this matrix is asumed.
- function (k). Returns noise matrix of measurement equation on iteration k. k: (iteration number).
- Y: matrix or vector
- Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. May have missing values.
- p_mean: vector
- Initial distribution mean. If None it is assumed to be zero
- P_init: square symmetric matrix or scalar
- Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead.
- calc_log_likelihood: boolean
- Whether to calculate marginal likelihood of the state-space model.
-
classmethod
kalman_filter
(p_A, p_Q, p_H, p_R, Y, index=None, m_init=None, P_init=None, p_kalman_filter_type='regular', calc_log_likelihood=False, calc_grad_log_likelihood=False, grad_params_no=None, grad_calc_params=None)[source]¶ This function implements the basic Kalman Filter algorithm These notations for the State-Space model are assumed:
x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})Returns estimated filter distributions x_{k} ~ N(m_{k}, P(k))
1) The function generaly do not modify the passed parameters. If it happens then it is an error. There are several exeprions: scalars can be modified into a matrix, in some rare cases shapes of the derivatives matrices may be changed, it is ignored for now.
2) Copies of p_A, p_Q, index are created in memory to be used later in smoother. References to copies are kept in “matrs_for_smoother” return parameter.
3) Function support “multiple time series mode” which means that exactly the same State-Space model is used to filter several sets of measurements. In this case third dimension of Y should include these state-space measurements Log_likelihood and Grad_log_likelihood have the corresponding dimensions then.
4) Calculation of Grad_log_likelihood is not supported if matrices A,Q, H, or R changes over time. (later may be changed)
5) Measurement may include missing values. In this case update step is not done for this measurement. (later may be changed)
- p_A: scalar, square matrix, 3D array
- A_{k} in the model. If matrix then A_{k} = A - constant. If it is 3D array then A_{k} = p_A[:,:, index[0,k]]
- p_Q: scalar, square symmetric matrix, 3D array
- Q_{k-1} in the model. If matrix then Q_{k-1} = Q - constant. If it is 3D array then Q_{k-1} = p_Q[:,:, index[1,k]]
- p_H: scalar, matrix (measurement_dim, state_dim) , 3D array
- H_{k} in the model. If matrix then H_{k} = H - constant. If it is 3D array then H_{k} = p_Q[:,:, index[2,k]]
- p_R: scalar, square symmetric matrix, 3D array
- R_{k} in the model. If matrix then R_{k} = R - constant. If it is 3D array then R_{k} = p_R[:,:, index[3,k]]
- Y: matrix or vector or 3D array
- Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. If 3D array then third dimension correspond to “multiple time series mode”.
- index: vector
- Which indices (on 3-rd dimension) from arrays p_A, p_Q,p_H, p_R to use on every time step. If this parameter is None then it is assumed that p_A, p_Q, p_H, p_R do not change over time and indices are not needed. index[0,:] - correspond to A, index[1,:] - correspond to Q index[2,:] - correspond to H, index[3,:] - correspond to R. If index.shape[0] == 1, it is assumed that indides for all matrices are the same.
- m_init: vector or matrix
- Initial distribution mean. If None it is assumed to be zero. For “multiple time series mode” it is matrix, second dimension of which correspond to different time series. In regular case (“one time series mode”) it is a vector.
- P_init: square symmetric matrix or scalar
- Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead. “multiple time series mode” does not affect it, since it does not affect anything related to state variaces.
- calc_log_likelihood: boolean
- Whether to calculate marginal likelihood of the state-space model.
- calc_grad_log_likelihood: boolean
- Whether to calculate gradient of the marginal likelihood of the state-space model. If true then “grad_calc_params” parameter must provide the extra parameters for gradient calculation.
- grad_params_no: int
- If previous parameter is true, then this parameters gives the total number of parameters in the gradient.
- grad_calc_params: dictionary
- Dictionary with derivatives of model matrices with respect to parameters “dA”, “dQ”, “dH”, “dR”, “dm_init”, “dP_init”. They can be None, in this case zero matrices (no dependence on parameters) is assumed. If there is only one parameter then third dimension is automatically added.
- M: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
- Filter estimates of the state means. In the extra step the initial value is included. In the “multiple time series mode” third dimension correspond to different timeseries.
- P: (no_steps+1, state_dim, state_dim) 3D array
- Filter estimates of the state covariances. In the extra step the initial value is included.
- log_likelihood: double or (1, time_series_no) 3D array.
- If the parameter calc_log_likelihood was set to true, return logarithm of marginal likelihood of the state-space model. If the parameter was false, return None. In the “multiple time series mode” it is a vector providing log_likelihood for each time series.
- grad_log_likelihood: column vector or (grad_params_no, time_series_no) matrix
- If calc_grad_log_likelihood is true, return gradient of log likelihood with respect to parameters. It returns it column wise, so in “multiple time series mode” gradients for each time series is in the corresponding column.
- matrs_for_smoother: dict
- Dictionary with model functions for smoother. The intrinsic model functions are computed in this functions and they are returned to use in smoother for convenience. They are: ‘p_a’, ‘p_f_A’, ‘p_f_Q’ The dictionary contains the same fields.
-
classmethod
rts_smoother
(state_dim, p_dynamic_callables, filter_means, filter_covars)[source]¶ This function implements Rauch–Tung–Striebel(RTS) smoother algorithm based on the results of kalman_filter_raw. These notations are the same:
x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})Returns estimated smoother distributions x_{k} ~ N(m_{k}, P(k))
- p_a: function (k, x_{k-1}, A_{k}). Dynamic function.
- k (iteration number), starts at 0 x_{k-1} State from the previous step A_{k} Jacobian matrices of f_a. In the linear case it is exactly A_{k}.
- p_f_A: function (k, m, P) return Jacobian of dynamic function, it is
- passed into p_a. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
- p_f_Q: function (k). Returns noise matrix of dynamic model on iteration k.
- k (iteration number). starts at 0
- filter_means: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
- Results of the Kalman Filter means estimation.
- filter_covars: (no_steps+1, state_dim, state_dim) 3D array
- Results of the Kalman Filter covariance estimation.
- M: (no_steps+1, state_dim) matrix
- Smoothed estimates of the state means
- P: (no_steps+1, state_dim, state_dim) 3D array
- Smoothed estimates of the state covariances
-
classmethod
-
class
DescreteStateSpaceMeta
[source]¶ Bases:
type
Substitute necessary methods from cython.
After thos method the class object is created
-
Dynamic_Callables_Class
¶ alias of
GPy.models.state_space_main.Dynamic_Callables_Python
-
class
Dynamic_Callables_Python
[source]¶ Bases:
object
-
Ak
(k, m, P)[source]¶ function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.
k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
-
Q_srk
(k)[source]¶ function (k). Returns the square root of noise matrix of dynamic model on iteration k.
k (iteration number). starts at 0
This function is implemented to use SVD prediction step.
-
Qk
(k)[source]¶ - function (k). Returns noise matrix of dynamic model on iteration k.
- k (iteration number). starts at 0
-
dAk
(k)[source]¶ - function (k). Returns the derivative of A on iteration k.
- k (iteration number). starts at 0
-
dQk
(k)[source]¶ - function (k). Returns the derivative of Q on iteration k.
- k (iteration number). starts at 0
-
-
Measurement_Callables_Class
¶ alias of
GPy.models.state_space_main.Measurement_Callables_Python
-
class
Measurement_Callables_Python
[source]¶ Bases:
object
-
Hk
(k, m_pred, P_pred)[source]¶ - function (k, m, P) return Jacobian of measurement function, it is
- passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
-
R_isrk
(k)[source]¶ - function (k). Returns the square root of the noise matrix of
- measurement equation on iteration k. k (iteration number). starts at 0
This function is implemented to use SVD prediction step.
-
Rk
(k)[source]¶ - function (k). Returns noise matrix of measurement equation
- on iteration k. k (iteration number). starts at 0
-
dHk
(k)[source]¶ - function (k). Returns the derivative of H on iteration k.
- k (iteration number). starts at 0
-
dRk
(k)[source]¶ - function (k). Returns the derivative of R on iteration k.
- k (iteration number). starts at 0
-
-
Q_handling_Class
¶
-
class
Q_handling_Python
(Q, index, Q_time_var_index, unique_Q_number, dQ=None)[source]¶ Bases:
GPy.models.state_space_main.Dynamic_Callables_Python
- R - array with noise on various steps. The result of preprocessing
- the noise input.
- index - for each step of Kalman filter contains the corresponding index
- in the array.
- R_time_var_index - another index in the array R. Computed earlier and
- passed here.
- unique_R_number - number of unique noise matrices below which square
- roots are cached and above which they are computed each time.
- dQ: 3D array[:, :, param_num]
- derivative of Q. Derivative is supported only when Q do not change over time
- Object which has two necessary functions:
- f_R(k) inv_R_square_root(k)
-
Q_srk
(k)[source]¶ - function (k). Returns the square root of noise matrix of dynamic model
- on iteration k.
k (iteration number). starts at 0
This function is implemented to use SVD prediction step.
-
R_handling_Class
¶
-
class
R_handling_Python
(R, index, R_time_var_index, unique_R_number, dR=None)[source]¶ Bases:
GPy.models.state_space_main.Measurement_Callables_Python
The calss handles noise matrix R.
- R - array with noise on various steps. The result of preprocessing
- the noise input.
- index - for each step of Kalman filter contains the corresponding index
- in the array.
- R_time_var_index - another index in the array R. Computed earlier and
- is passed here.
- unique_R_number - number of unique noise matrices below which square
- roots are cached and above which they are computed each time.
- dR: 3D array[:, :, param_num]
- derivative of R. Derivative is supported only when R do not change over time
- Object which has two necessary functions:
- f_R(k) inv_R_square_root(k)
-
Std_Dynamic_Callables_Class
¶ alias of
GPy.models.state_space_main.Std_Dynamic_Callables_Python
-
class
Std_Dynamic_Callables_Python
(A, A_time_var_index, Q, index, Q_time_var_index, unique_Q_number, dA=None, dQ=None)[source]¶ Bases:
GPy.models.state_space_main.Q_handling_Python
-
Ak
(k, m_pred, P_pred)[source]¶ - function (k, m, P) return Jacobian of measurement function, it is
- passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
-
dAk
(k)[source]¶ function (k). Returns the derivative of A on iteration k. k (iteration number). starts at 0
-
-
Std_Measurement_Callables_Class
¶ alias of
GPy.models.state_space_main.Std_Measurement_Callables_Python
-
class
Std_Measurement_Callables_Python
(H, H_time_var_index, R, index, R_time_var_index, unique_R_number, dH=None, dR=None)[source]¶ Bases:
GPy.models.state_space_main.R_handling_Python
-
Hk
(k, m_pred, P_pred)[source]¶ - function (k, m, P) return Jacobian of measurement function, it is
- passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
-
-
balance_matrix
(A)[source]¶ Balance matrix, i.e. finds such similarity transformation of the original matrix A: A = T * bA * T^{-1}, where norms of columns of bA and of rows of bA are as close as possible. It is usually used as a preprocessing step in eigenvalue calculation routine. It is useful also for State-Space models.
- See also:
- [1] Beresford N. Parlett and Christian Reinsch (1969). Balancing
- a matrix for calculation of eigenvalues and eigenvectors. Numerische Mathematik, 13(4): 293-304.
- A: square matrix
- Matrix to be balanced
- bA: matrix
- Balanced matrix
- T: matrix
- Left part of the similarity transformation
- T_inv: matrix
- Right part of the similarity transformation.
GPy.models.state_space_model module¶
-
class
StateSpace
(X, Y, kernel=None, noise_var=1.0, kalman_filter_type='regular', use_cython=False, balance=False, name='StateSpace')[source]¶ Bases:
GPy.core.model.Model
balance: bool Whether to balance or not the model as a whole
-
plot
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)¶ Convenience function for plotting the fit of a GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
If you want fine graned control use the specific plotting functions supplied in the model.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- projection ({2d|3d}) – plot in 2d or 3d?
- legend (bool) – convenience, whether to put a legend on the plot or not.
-
plot_confidence
(lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_data
(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)¶ - Plot the training data
- For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_data_error
(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)¶ Plot the training data input error.
For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_density
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_errorbars_trainset
(which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)¶ Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.
This also works for heteroscedastic likelihoods.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols – when the data has several columns (independant outputs), only plot these
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_f
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_latent
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_mean
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – for 2D plotting, the number of contour levels to use is
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_noiseless
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_samples
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- levels (int) – for 2D plotting, the number of contour levels to use is
-
GPy.models.state_space_setup module¶
This module is intended for the setup of state_space_main module. The need of this module appeared because of the way state_space_main module connected with cython code.
GPy.models.tp_regression module¶
-
class
TPRegression
(X, Y, kernel=None, deg_free=5.0, normalizer=None, mean_function=None, name='TP regression')[source]¶ Bases:
GPy.core.model.Model
Student-t Process model for regression, as presented in
Shah, A., Wilson, A. and Ghahramani, Z., 2014, April. Student-t processes as alternatives to Gaussian processes. In Artificial Intelligence and Statistics (pp. 877-885).Parameters: - X – input observations
- Y – observed values
- kernel – a GPy kernel, defaults to rbf
- deg_free – initial value for the degrees of freedom hyperparameter
- normalizer (Norm) –
[False]
Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)
Note
Multiple independent outputs are allowed using columns of Y
-
log_likelihood
()[source]¶ The log marginal likelihood of the model, \(p(\mathbf{y})\), this is the objective function of the model being optimised
-
parameters_changed
()[source]¶ Method that is called upon any changes to
Param
variables within the model. In particular in this class this method re-performs inference, recalculating the posterior, log marginal likelihood and gradients of the modelWarning
This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.
-
plot
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)¶ Convenience function for plotting the fit of a GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
If you want fine graned control use the specific plotting functions supplied in the model.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- projection ({2d|3d}) – plot in 2d or 3d?
- legend (bool) – convenience, whether to put a legend on the plot or not.
-
plot_confidence
(lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_data
(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)¶ - Plot the training data
- For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_data_error
(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)¶ Plot the training data input error.
For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_density
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_errorbars_trainset
(which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)¶ Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.
This also works for heteroscedastic likelihoods.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols – when the data has several columns (independant outputs), only plot these
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_f
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_latent
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_mean
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – for 2D plotting, the number of contour levels to use is
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_noiseless
(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_samples
(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- levels (int) – for 2D plotting, the number of contour levels to use is
-
posterior_samples
(X, size=10, full_cov=False, Y_metadata=None, likelihood=None, **predict_kwargs)[source]¶ Samples the posterior GP at the points X, equivalent to posterior_samples_f due to the absence of a likelihood.
-
posterior_samples_f
(X, size=10, full_cov=True, **predict_kwargs)[source]¶ Samples the posterior TP at the points X.
Parameters: - X (np.ndarray (Nnew x self.input_dim)) – The points at which to take the samples.
- size (int.) – the number of a posteriori samples.
- full_cov (bool.) – whether to return the full covariance matrix, or just the diagonal.
Returns: fsim: set of simulations
Return type: np.ndarray (D x N x samples) (if D==1 we flatten out the first dimension)
-
predict
(Xnew, full_cov=False, kern=None, **kwargs)[source]¶ Predict the function(s) at the new point(s) Xnew. For Student-t processes, this method is equivalent to predict_noiseless as no likelihood is included in the model.
-
predict_noiseless
(Xnew, full_cov=False, kern=None)[source]¶ Predict the underlying function f at the new point(s) Xnew.
Parameters: - Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
- full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
- kern – The kernel to use for prediction (defaults to the model kern).
Returns: - (mean, var):
mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False, Nnew x Nnew otherwise
If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.
-
predict_quantiles
(X, quantiles=(2.5, 97.5), kern=None, **kwargs)[source]¶ Get the predictive quantiles around the prediction at X
Parameters: - X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
- quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
- kern – optional kernel to use for prediction
Returns: list of quantiles for each X and predictive quantiles for interval combination
Return type: [np.ndarray (Xnew x self.output_dim), np.ndarray (Xnew x self.output_dim)]
GPy.models.warped_gp module¶
-
class
WarpedGP
(X, Y, kernel=None, warping_function=None, warping_terms=3, normalizer=False)[source]¶ Bases:
GPy.core.gp.GP
This defines a GP Regression model that applies a warping function to the output.
-
log_predictive_density
(x_test, y_test, Y_metadata=None)[source]¶ Calculation of the log predictive density. Notice we add the jacobian of the warping function here.
Parameters: - x_test ((Nx1) array) – test locations (x_{*})
- y_test ((Nx1) array) – test observations (y_{*})
- Y_metadata – metadata associated with the test points
-
predict
(Xnew, kern=None, pred_init=None, Y_metadata=None, median=False, deg_gauss_hermite=20, likelihood=None)[source]¶ Prediction results depend on: - The value of the self.predict_in_warped_space flag - The median flag passed as argument The likelihood keyword is never used, it is just to follow the plotting API.
-
predict_quantiles
(X, quantiles=(2.5, 97.5), Y_metadata=None, likelihood=None, kern=None)[source]¶ Get the predictive quantiles around the prediction at X
Parameters: - X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
- quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
Returns: list of quantiles for each X and predictive quantiles for interval combination
Return type: [np.ndarray (Xnew x self.input_dim), np.ndarray (Xnew x self.input_dim)]
-
GPy.kern package¶
Introduction¶
In terms of Gaussian Processes, a kernel is a function that specifies the degree of similarity between variables given their relative positions in parameter space. If known variables x and x’ are close together then observed variables y and y’ may also be similar, depending on the kernel function and its parameters. Note: this may be too simple a definition for the broad range of kernels available in :py:class:`GPy`.
GPy.kern.src.kern.Kern
is a generic kernel object
inherited by more specific, end-user kernels used in models. It
provides methods that specific kernels should generally have such as
GPy.kern.src.kern.Kern.K
to compute the value of the
kernel, GPy.kern.src.kern.Kern.add
to combine kernels and
numerous functions providing information on kernel gradients.
There are several inherited types of kernel that provide a basis for specific end user kernels:

e.g. the archetype GPy.kern.RBF
does not inherit directly from GPy.kern.src.kern.Kern
, but from GPy.kern.src.stationary
.

Subpackages¶
GPy.kern.src package¶
Subpackages¶
An approximated psi-statistics implementation based on Gauss-Hermite Quadrature
The package for the Psi statistics computation of the linear kernel for Bayesian GPLVM
The module for psi-statistics for RBF kernel
The module for psi-statistics for RBF kernel
The package for the Psi statistics computation of the linear kernel for SSGPLVM
The package for the psi statistics computation
The module for psi-statistics for RBF kernel for Spike-and-Slab GPLVM
-
class
PSICOMP_SSRBF_GPU
(threadnum=128, blocknum=15, GPU_direct=False)[source]¶
Submodules¶
GPy.kern.src.ODE_UY module¶
-
class
ODE_UY
(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, active_dims=None, name='ode_uy')[source]¶ Bases:
GPy.kern.src.kern.Kern
GPy.kern.src.ODE_UYC module¶
-
class
ODE_UYC
(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, ubias=1.0, active_dims=None, name='ode_uyc')[source]¶ Bases:
GPy.kern.src.kern.Kern
GPy.kern.src.ODE_st module¶
-
class
ODE_st
(input_dim, a=1.0, b=1.0, c=1.0, variance_Yx=3.0, variance_Yt=1.5, lengthscale_Yx=1.5, lengthscale_Yt=1.5, active_dims=None, name='ode_st')[source]¶ Bases:
GPy.kern.src.kern.Kern
kernel resultiong from a first order ODE with OU driving GP
Parameters: - input_dim (int) – the number of input dimension, has to be equal to one
- varianceU (float) – variance of the driving GP
- lengthscaleU (float) – lengthscale of the driving GP (sqrt(3)/lengthscaleU)
- varianceY (float) – ‘variance’ of the transfer function
- lengthscaleY (float) – ‘lengthscale’ of the transfer function (1/lengthscaleY)
Return type: kernel object
GPy.kern.src.ODE_t module¶
-
class
ODE_t
(input_dim, a=1.0, c=1.0, variance_Yt=3.0, lengthscale_Yt=1.5, ubias=1.0, active_dims=None, name='ode_st')[source]¶ Bases:
GPy.kern.src.kern.Kern
GPy.kern.src.add module¶
-
class
Add
(subkerns, name='sum')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
Add given list of kernels together. propagates gradients through.
This kernel will take over the active dims of it’s subkernels passed in.
NOTE: The subkernels will be copies of the original kernels, to prevent unexpected behavior.
-
K
(X, X2=None, which_parts=None)[source]¶ Add all kernels together. If a list of parts (of this kernel!) which_parts is given, only the parts of the list are taken to compute the covariance.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ Compute the gradient of the objective function with respect to X.
Parameters: - dL_dK (np.ndarray (num_samples x num_inducing)) – An array of gradients of the objective function with respect to the covariance function.
- X (np.ndarray (num_samples x input_dim)) – Observed data inputs
- X2 (np.ndarray (num_inducing x input_dim)) – Observed data inputs (optional, defaults to X)
-
gradients_XX
(dL_dK, X, X2)[source]¶ - \[\]
frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}
-
gradients_Z_expectations
(dL_psi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
input_sensitivity
(summarize=True)[source]¶ If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
sde_update_gradient_full
(gradients)[source]¶ Update gradient in the order in which parameters are represented in the kernel
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
update_gradients_diag
(dL_dK, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
GPy.kern.src.basis_funcs module¶
-
class
BasisFuncKernel
(input_dim, variance=1.0, active_dims=None, ARD=False, name='basis func kernel')[source]¶ Bases:
GPy.kern.src.kern.Kern
Abstract superclass for kernels with explicit basis functions for use in GPy.
This class does NOT automatically add an offset to the design matrix phi!
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
concatenate_offset
(X)[source]¶ Convenience function to add an offset column to phi. You can use this function to add an offset (bias on y axis) to phi in your custom self._phi(X).
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
posterior_inf
(X=None, posterior=None)[source]¶ Do the posterior inference on the parameters given this kernels functions and the model posterior, which has to be a GPy posterior, usually found at m.posterior, if m is a GPy model. If not given we search for the the highest parent to be a model, containing the posterior, and for X accordingly.
-
-
class
ChangePointBasisFuncKernel
(input_dim, changepoint, variance=1.0, active_dims=None, ARD=False, name='changepoint')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
The basis function has a changepoint. That is, it is constant, jumps at a single point (given as changepoint) and is constant again. You can give multiple changepoints. The changepoints are calculated using np.where(self.X < self.changepoint), -1, 1)
-
class
DomainKernel
(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='constant_domain')[source]¶ Bases:
GPy.kern.src.basis_funcs.LinearSlopeBasisFuncKernel
Create a constant plateou of correlation between start and stop and zero elsewhere. This is a constant shift of the outputs along the yaxis in the range from start to stop.
-
class
LinearSlopeBasisFuncKernel
(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='linear_segment')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.
Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.
-
class
LogisticBasisFuncKernel
(input_dim, centers, variance=1.0, slope=1.0, active_dims=None, ARD=False, ARD_slope=True, name='logistic')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
Create a series of logistic basis functions with centers given. The slope gets computed by datafit. The number of centers determines the number of logistic functions.
-
class
PolynomialBasisFuncKernel
(input_dim, degree, variance=1.0, active_dims=None, ARD=True, name='polynomial_basis')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.
Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.
GPy.kern.src.brownian module¶
-
class
Brownian
(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]¶ Bases:
GPy.kern.src.kern.Kern
Brownian motion in 1D only.
Negative times are treated as a separate (backwards!) Brownian motion.
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) –
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
GPy.kern.src.coregionalize module¶
-
class
Coregionalize
(input_dim, output_dim, rank=1, W=None, kappa=None, active_dims=None, name='coregion')[source]¶ Bases:
GPy.kern.src.kern.Kern
Covariance function for intrinsic/linear coregionalization models
This covariance has the form:
\[\mathbf{B} = \mathbf{W}\mathbf{W}^\intercal + \mathrm{diag}(kappa)\]An intrinsic/linear coregionalization covariance function of the form:
\[k_2(x, y)=\mathbf{B} k(x, y)\]it is obtained as the tensor product between a covariance function k(x, y) and B.
Parameters: - output_dim (int) – number of outputs to coregionalize
- rank (int) – number of columns of the W matrix (this parameter is ignored if parameter W is not None)
- W (numpy array of dimensionality (num_outpus, W_columns)) – a low rank matrix that determines the correlations between the different outputs, together with kappa it forms the coregionalization matrix B
- kappa (numpy array of dimensionality (output_dim, )) – a vector which allows the outputs to behave independently
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
GPy.kern.src.coregionalize_cython module¶
GPy.kern.src.diff_kern module¶
-
class
DiffKern
(base_kern, dimension)[source]¶ Bases:
GPy.kern.src.kern.Kern
Diff kernel is a thin wrapper for using partial derivatives of kernels as kernels. Eg. in combination with Multioutput kernel this allows the user to train GPs with observations of latent function and latent function derivatives. NOTE: DiffKern only works when used with Multioutput kernel. Do not use the kernel as standalone
The parameters the kernel needs are: -‘base_kern’: a member of Kernel class that is used for observations -‘dimension’: integer that indigates in which dimensions the partial derivative observations are
-
K
(X, X2=None, dimX2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
update_gradients_diag
(dL_dK_diag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_full
(dL_dK, X, X2=None, dimX2=None)[source]¶ Set the gradients of all parameters when doing full (N) inference.
-
gradient
¶
-
GPy.kern.src.eq_ode1 module¶
-
class
EQ_ODE1
(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, decay=None, active_dims=None, name='eq_ode1')[source]¶ Bases:
GPy.kern.src.kern.Kern
Covariance function for first order differential equation driven by an exponentiated quadratic covariance.
This outputs of this kernel have the form .. math:
rac{ ext{d}y_j}{ ext{d}t} = sum_{i=1}^R w_{j,i} u_i(t-delta_j) - d_jy_j(t)
where \(R\) is the rank of the system, \(w_{j,i}\) is the sensitivity of the \(j\) is the decay rate of the \(j\) are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.
param output_dim: number of outputs driven by latent function. type output_dim: int param W: sensitivities of each output to the latent driving function. type W: ndarray (output_dim x rank). param rank: If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance. type rank: int param decay: decay rates for the first order system. type decay: array of length output_dim. param delay: delay between latent force and output response. type delay: array of length output_dim. param kappa: diagonal term that allows each latent output to have an independent component to the response. type kappa: array of length output_dim. -
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.eq_ode2 module¶
-
class
EQ_ODE2
(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, C=None, B=None, active_dims=None, name='eq_ode2')[source]¶ Bases:
GPy.kern.src.kern.Kern
Covariance function for second order differential equation driven by an exponentiated quadratic covariance.
This outputs of this kernel have the form .. math:
rac{ ext{d}^2y_j(t)}{ ext{d}^2t} + C_j rac{ ext{d}y_j(t)}{ ext{d}t} + B_jy_j(t) = sum_{i=1}^R w_{j,i} u_i(t)
where \(R\) is the rank of the system, \(w_{j,i}\) is the sensitivity of the \(j\) is the decay rate of the \(j\) and \(g_i(t)\) are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.
param output_dim: number of outputs driven by latent function. type output_dim: int param W: sensitivities of each output to the latent driving function. type W: ndarray (output_dim x rank). param rank: If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance. type rank: int param C: damper constant for the second order system. type C: array of length output_dim. param B: spring constant for the second order system. type B: array of length output_dim. -
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.grid_kerns module¶
-
class
GridKern
(input_dim, variance, lengthscale, ARD, active_dims, name, originalDimensions, useGPU=False)[source]¶
-
class
GridRBF
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='gridRBF', originalDimensions=1, useGPU=False)[source]¶ Bases:
GPy.kern.src.grid_kerns.GridKern
Similar to regular RBF but supplemented with methods required for Gaussian grid regression Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]
GPy.kern.src.independent_outputs module¶
-
class
Hierarchical
(kernels, name='hierarchy')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
A kernel which can represent a simple hierarchical model.
See Hensman et al 2013, “Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters” http://www.biomedcentral.com/1471-2105/14/252
To construct this kernel, you must pass a list of kernels. the first kernel will be assumed to be the ‘base’ kernel, and will be computed everywhere. For every additional kernel, we assume another layer in the hierachy, with a corresponding column of the input matrix which indexes which function the data are in at that level.
For more, see the ipython notebook documentation on Hierarchical covariances.
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
-
class
IndependentOutputs
(kernels, index_dim=-1, name='independ')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
A kernel which can represent several independent functions. this kernel ‘switches off’ parts of the matrix where the output indexes are different.
The index of the functions is given by the last column in the input X the rest of the columns of X are passed to the underlying kernel for computation (in blocks).
Parameters: kernels – either a kernel, or list of kernels to work with. If it is a list of kernels the indices in the index_dim, index the kernels you gave!
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.integral module¶
-
class
Integral
(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶ Bases:
GPy.kern.src.kern.Kern
Integral kernel between…
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
GPy.kern.src.integral_limits module¶
-
class
Integral_Limits
(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶ Bases:
GPy.kern.src.kern.Kern
Integral kernel. This kernel allows 1d histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs (on two dimensions) are the start and end points of each bin. The kernel’s predictions are the latent function which might have generated those binned results.
-
K
(X, X2=None)[source]¶ - Note: We have a latent function and an output function. We want to be able to find:
- the covariance between values of the output function
- the covariance between values of the latent function
- the “cross covariance” between values of the output function and the latent function
This method is used by GPy to either get the covariance between the outputs (K_xx) or is used to get the cross covariance (between the latent function and the outputs (K_xf). We take advantage of the places where this function is used:
- if X2 is none, then we know that the items being compared (to get the covariance for)
are going to be both from the OUTPUT FUNCTION. - if X2 is not none, then we know that the items being compared are from two different sets (the OUTPUT FUNCTION and the LATENT FUNCTION).
If we want the covariance between values of the LATENT FUNCTION, we take advantage of the fact that we only need that when we do prediction, and this only calls Kdiag (not K). So the covariance between LATENT FUNCTIONS is available from Kdiag.
-
Kdiag
(X)[source]¶ I’ve used the fact that we call this method during prediction (instead of K). When we do prediction we want to know the covariance between LATENT FUNCTIONS (K_ff) (as that’s probably what the user wants). $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$
-
k_ff
(t, tprime, l)[source]¶ Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required
-
k_xf
(t, tprime, s, l)[source]¶ Covariance between the gradient (latent value) and the actual (observed) value.
Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.
-
k_xx
(t, tprime, s, sprime, l)[source]¶ Covariance between observed values.
s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)
We’re interested in how correlated these two integrals are.
Note: We’ve not multiplied by the variance, this is done in K.
-
GPy.kern.src.kern module¶
-
class
CombinationKernel
(kernels, name, extra_dims=[], link_parameters=True)[source]¶ Bases:
GPy.kern.src.kern.Kern
Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.
Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.
Parameters: - kernels (list) – List of kernels to combine (can be only one element)
- name (str) – name of the combination kernel
- extra_dims (array-like) – if needed extra dimensions for the combination kernel to work on
-
input_sensitivity
(summarize=True)[source]¶ If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.
-
parts
¶
-
class
Kern
(input_dim, active_dims, name, useGPU=False, *a, **kw)[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
The base class for a kernel: a positive definite function which forms of a covariance function (kernel).
input_dim:
is the number of dimensions to work on. Make sure to give the tight dimensionality of inputs. You most likely want this to be the integer telling the number of input dimensions of the kernel.active_dims:
is the active_dimensions of inputs X we will work on. All kernels will get sliced Xes as inputs, if _all_dims_active is not None Only positive integers are allowed in active_dims! if active_dims is None, slicing is switched off and all X will be passed through as given.Parameters: - input_dim (int) – the number of input dimensions to the function
- active_dims (array-like|None) – list of indices on which dimensions this kernel works on, or none if no slicing
Do not instantiate.
-
K
(X, X2)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
add
(other, name='sum')[source]¶ Add another kernel to this one.
Parameters: other (GPy.kern) – the other kernel to be added
-
static
from_dict
(input_dict)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
get_most_significant_input_dimensions
(which_indices=None)[source]¶ Determine which dimensions should be plotted
Returns the top three most signification input dimensions
if less then three dimensions, the non existing dimensions are labeled as None, so for a 1 dimensional input this returns (0, None, None).
Parameters: which_indices (int or tuple(int,int) or tuple(int,int,int)) – force the indices to be the given indices.
-
gradients_X
(dL_dK, X, X2)[source]¶ - \[\frac{\partial L}{\partial X} = \frac{\partial L}{\partial K}\frac{\partial K}{\partial X}\]
-
gradients_XX
(dL_dK, X, X2, cov=True)[source]¶ - \[\frac{\partial^2 L}{\partial X\partial X_2} = \frac{\partial L}{\partial K}\frac{\partial^2 K}{\partial X\partial X_2}\]
-
gradients_XX_diag
(dL_dKdiag, X, cov=True)[source]¶ The diagonal of the second derivative w.r.t. X and X2
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior, psi0=None, psi1=None, psi2=None)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
plot
(*args, **kwargs)¶
-
plot_ARD
(filtering=None, legend=False, canvas=None, **kwargs)¶ If an ARD kernel is present, plot a bar representation using matplotlib
Parameters: - fignum – figure number of the plot
- filtering (list of names to use for ARD plot) – list of names, which to use for plotting ARD parameters. Only kernels which match names in the list of names in filtering will be used for plotting.
-
plot_covariance
(x=None, label=None, plot_limits=None, visible_dims=None, resolution=None, projection='2d', levels=20, **kwargs)¶ Plot a kernel covariance w.r.t. another x.
Parameters: - x (array-like) – the value to use for the other kernel argument (kernels are a function of two variables!)
- plot_limits (Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D) – the range over which to plot the kernel
- visible_dims (array-like) – input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
- projection ({2d|3d}) – What projection shall we use to plot the kernel?
- levels (int) – for 2D projection, how many levels for the contour plot to use?
- kwargs – valid kwargs for your specific plotting library
Resolution: the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.
-
prod
(other, name='mul')[source]¶ Multiply two kernels (either on the same space, or on the tensor product of the input space).
Parameters: other (GPy.kern) – the other kernel to be added
-
psi2
(Z, variational_posterior)[source]¶ - \[\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]\]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
GPy.kern.src.kernel_slice_operations module¶
Created on 11 Mar 2014
@author: @mzwiessele
This module provides a meta class for the kernels. The meta class is for slicing the inputs (X, X2) for the kernels, before K (or any other method involving X) gets calls. The _all_dims_active of a kernel decide which dimensions the kernel works on.
GPy.kern.src.linear module¶
-
class
Linear
(input_dim, variances=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.kern.Kern
Linear kernel
\[k(x,y) = \sum_{i=1}^{\text{input_dim}} \sigma^2_i x_iy_i\]Parameters: - input_dim (int) – the number of input dimensions
- variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
- ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type: kernel object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
gradients_XX
(dL_dK, X, X2=None)[source]¶ Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:
returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].
..math:
rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}
- ..returns:
- dL2_dXdX2: [NxMxQxQ] for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None)
- Thus, we return the second derivative in X2.
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
class
LinearFull
(input_dim, rank, W=None, kappa=None, active_dims=None, name='linear_full')[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.mlp module¶
-
class
MLP
(input_dim, variance=1.0, weight_variance=1.0, bias_variance=1.0, ARD=False, active_dims=None, name='mlp')[source]¶ Bases:
GPy.kern.src.kern.Kern
Multi layer perceptron kernel (also known as arc sine kernel or neural network kernel)
\[k(x,y) = \sigma^{2}\frac{2}{\pi } \text{asin} \left ( \frac{ \sigma_w^2 x^\top y+\sigma_b^2}{\sqrt{\sigma_w^2x^\top x + \sigma_b^2 + 1}\sqrt{\sigma_w^2 y^\top y + \sigma_b^2 +1}} \right )\]Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance \(\sigma^2\)
- weight_variance (array or list of the appropriate size (or float if there is only one weight variance parameter)) – the vector of the variances of the prior over input weights in the neural network \(\sigma^2_w\)
- bias_variance – the variance of the prior over bias parameters \(\sigma^2_b\)
- ARD (Boolean) – Auto Relevance Determination. If equal to “False”, the kernel is isotropic (ie. one weight variance parameter sigma^2_w), otherwise there is one weight variance parameter per dimension.
Return type: Kernpart object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
GPy.kern.src.multidimensional_integral_limits module¶
-
class
Multidimensional_Integral_Limits
(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶ Bases:
GPy.kern.src.kern.Kern
Integral kernel, can include limits on each integral value. This kernel allows an n-dimensional histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs are the start and end points of each bin: Pairs of inputs act as the limits on each bin. So inputs 4 and 5 provide the start and end values of each bin in the 3rd dimension. The kernel’s predictions are the latent function which might have generated those binned results.
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
Kdiag
(X)[source]¶ I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$
-
k_ff
(t, tprime, l)[source]¶ Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required
-
k_xf
(t, tprime, s, l)[source]¶ Covariance between the gradient (latent value) and the actual (observed) value.
Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.
-
k_xx
(t, tprime, s, sprime, l)[source]¶ Covariance between observed values.
s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)
We’re interested in how correlated these two integrals are.
Note: We’ve not multiplied by the variance, this is done in K.
-
GPy.kern.src.multioutput_derivative_kern module¶
-
class
KernWrapper
(fk, fug, fg, base_kern)[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
update_gradients_full
(dL_dK, X, X2=None)[source]¶ Set the gradients of all parameters when doing full (N) inference.
-
gradient
¶
-
-
class
MultioutputDerivativeKern
(kernels, cross_covariances={}, name='MultioutputDerivativeKern')[source]¶ Bases:
GPy.kern.src.multioutput_kern.MultioutputKern
Multioutput derivative kernel is a meta class for combining different kernels for multioutput GPs. Multioutput derivative kernel is only a thin wrapper for Multioutput kernel for user not having to define cross covariances.
GPy.kern.src.multioutput_kern module¶
-
class
MultioutputKern
(kernels, cross_covariances={}, name='MultioutputKern')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
Multioutput kernel is a meta class for combining different kernels for multioutput GPs.
As an example let us have inputs x1 for output 1 with covariance k1 and x2 for output 2 with covariance k2. In addition, we need to define the cross covariances k12(x1,x2) and k21(x2,x1). Then the kernel becomes: k([x1,x2],[x1,x2]) = [k1(x1,x1) k12(x1, x2); k21(x2, x1), k2(x2,x2)]
For the kernel, the kernels of outputs are given as list in param “kernels” and cross covariances are given in param “cross_covariances” as a dictionary of tuples (i,j) as keys. If no cross covariance is given, it defaults to zero, as in k12(x1,x2)=0.
In the cross covariance dictionary, the value needs to be a struct with elements -‘kernel’: a member of Kernel class that stores the hyper parameters to be updated when optimizing the GP -‘K’: function defining the cross covariance -‘update_gradients_full’: a function to be used for updating gradients -‘gradients_X’: gives a gradient of the cross covariance with respect to the first input
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
-
class
ZeroKern
[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
update_gradients_full
(dL_dK, X, X2=None)[source]¶ Set the gradients of all parameters when doing full (N) inference.
-
gradient
¶
-
GPy.kern.src.periodic module¶
-
class
Periodic
(input_dim, variance, lengthscale, period, n_freq, lower, upper, active_dims, name)[source]¶ Bases:
GPy.kern.src.kern.Kern
Parameters: - variance (float) – the variance of the Matern kernel
- lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
- period (float) – the period
- n_freq (int) – the number of frequencies considered for the periodic subspace
Return type: kernel object
-
class
PeriodicExponential
(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_exponential')[source]¶ Bases:
GPy.kern.src.periodic.Periodic
Kernel of the periodic subspace (up to a given frequency) of a exponential (Matern 1/2) RKHS.
Only defined for input_dim=1.
-
class
PeriodicMatern32
(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern32')[source]¶ Bases:
GPy.kern.src.periodic.Periodic
Kernel of the periodic subspace (up to a given frequency) of a Matern 3/2 RKHS. Only defined for input_dim=1.
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the Matern kernel
- lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
- period (float) – the period
- n_freq (int) – the number of frequencies considered for the periodic subspace
Return type: kernel object
-
class
PeriodicMatern52
(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern52')[source]¶ Bases:
GPy.kern.src.periodic.Periodic
Kernel of the periodic subspace (up to a given frequency) of a Matern 5/2 RKHS. Only defined for input_dim=1.
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the Matern kernel
- lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
- period (float) – the period
- n_freq (int) – the number of frequencies considered for the periodic subspace
Return type: kernel object
GPy.kern.src.poly module¶
-
class
Poly
(input_dim, variance=1.0, scale=1.0, bias=1.0, order=3.0, active_dims=None, name='poly')[source]¶ Bases:
GPy.kern.src.kern.Kern
Polynomial kernel
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.prod module¶
-
class
Prod
(kernels, name='mul')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
Computes the product of 2 kernels
Parameters: k2 (k1,) – the kernels to multiply Return type: kernel object -
K
(X, X2=None, which_parts=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
(summarize=True)[source]¶ If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.
-
sde_update_gradient_full
(gradients)[source]¶ Update gradient in the order in which parameters are represented in the kernel
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
-
dkron
(A, dA, B, dB, operation='prod')[source]¶ Function computes the derivative of Kronecker product A*B (or Kronecker sum A+B).
- A: 2D matrix
- Some matrix
- dA: 3D (or 2D matrix)
- Derivarives of A
- B: 2D matrix
- Some matrix
- dB: 3D (or 2D matrix)
- Derivarives of B
- operation: str ‘prod’ or ‘sum’
- Which operation is considered. If the operation is ‘sum’ it is assumed that A and are square matrices.s
- Output:
- dC: 3D matrix Derivative of Kronecker product A*B (or Kronecker sum A+B)
GPy.kern.src.rbf module¶
-
class
RBF
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='rbf', useGPU=False, inv_l=False)[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]-
dK2_drdr_diag
()[source]¶ Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.
See also update_gradients_full
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
GPy.kern.src.sde_brownian module¶
Classes in this module enhance Brownian motion covariance function with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Brownian
(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]¶ Bases:
GPy.kern.src.brownian.Brownian
Class provide extra functionality to transfer this covariance function into SDE form.
Linear kernel:
\[k(x,y) = \sigma^2 min(x,y)\]
GPy.kern.src.sde_linear module¶
Classes in this module enhance Linear covariance function with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Linear
(input_dim, X, variances=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.linear.Linear
Class provide extra functionality to transfer this covariance function into SDE form.
Linear kernel:
\[k(x,y) = \sum_{i=1}^{input dim} \sigma^2_i x_iy_i\]Modify the init method, because one extra parameter is required. X - points on the X axis.
GPy.kern.src.sde_matern module¶
Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Matern32
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]¶ Bases:
GPy.kern.src.stationary.Matern32
Class provide extra functionality to transfer this covariance function into SDE forrm.
Matern 3/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
-
class
sde_Matern52
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]¶ Bases:
GPy.kern.src.stationary.Matern52
Class provide extra functionality to transfer this covariance function into SDE forrm.
Matern 5/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{5} r + \]rac{5}{3}r^2) exp(- sqrt{5} r) ext{ where } r = sqrt{sum_{i=1}^{input dim} rac{(x_i-y_i)^2}{ell_i^2} }
GPy.kern.src.sde_standard_periodic module¶
Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_StdPeriodic
(*args, **kwargs)[source]¶ Bases:
GPy.kern.src.standard_periodic.StdPeriodic
Class provide extra functionality to transfer this covariance function into SDE form.
Standard Periodic kernel:
\[k(x,y) = heta_1 \exp \left[ - \]- rac{1}{2} {}sum_{i=1}^{input_dim}
- left(
rac{sin( rac{pi}{lambda_i} (x_i - y_i) )}{l_i} ight)^2 ight] }
Init constructior.
Two optinal extra parameters are added in addition to the ones in StdPeriodic kernel.
Parameters: - approx_order (int) – approximation order for the RBF covariance. (Default 7)
- balance (bool) – Whether to balance this kernel separately. (Defaulf False). Model has a separate parameter for balancing.
-
sde
()[source]¶ Return the state space representation of the standard periodic covariance.
! Note: one must constrain lengthscale not to drop below 0.2. (independently of approximation order) After this Bessel functions of the first becomes NaN. Rescaling time variable might help.
! Note: one must keep period also not very low. Because then the gradients wrt wavelength become ustable. However this might depend on the data. For test example with 300 data points the low limit is 0.15.
-
seriescoeff
(m=6, lengthScale=1.0, magnSigma2=1.0, true_covariance=False)[source]¶ Calculate the coefficients q_j^2 for the covariance function approximation:
k( au) = sum_{j=0}^{+infty} q_j^2 cos(jomega_0 au)Reference is:
- [1] Arno Solin and Simo Särkkä (2014). Explicit link between periodic
- covariance functions and state space models. In Proceedings of the Seventeenth International Conference on Artifcial Intelligence and Statistics (AISTATS 2014). JMLR: W&CP, volume 33.
- Note! Only the infinite approximation (through Bessel function)
- is currently implemented.
- m: int
- Degree of approximation. Default 6.
- lengthScale: float
- Length scale parameter in the kerenl
- magnSigma2:float
- Multiplier in front of the kernel.
- coeffs: array(m+1)
- Covariance series coefficients
- coeffs_dl: array(m+1)
- Derivatives of the coefficients with respect to lengthscale.
GPy.kern.src.sde_static module¶
Classes in this module enhance Static covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Bias
(input_dim, variance=1.0, active_dims=None, name='bias')[source]¶ Bases:
GPy.kern.src.static.Bias
Class provide extra functionality to transfer this covariance function into SDE forrm.
Bias kernel:
\[k(x,y) = lpha\]
-
class
sde_White
(input_dim, variance=1.0, active_dims=None, name='white')[source]¶ Bases:
GPy.kern.src.static.White
Class provide extra functionality to transfer this covariance function into SDE forrm.
White kernel:
\[k(x,y) = lpha*\delta(x-y)\]
GPy.kern.src.sde_stationary module¶
Classes in this module enhance several stationary covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Exponential
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]¶ Bases:
GPy.kern.src.stationary.Exponential
Class provide extra functionality to transfer this covariance function into SDE form.
Exponential kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
-
class
sde_RBF
(*args, **kwargs)[source]¶ Bases:
GPy.kern.src.rbf.RBF
Class provide extra functionality to transfer this covariance function into SDE form.
Radial Basis Function kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
Init constructior.
Two optinal extra parameters are added in addition to the ones in RBF kernel.
Parameters: - approx_order (int) – approximation order for the RBF covariance. (Default 10)
- balance (bool) – Whether to balance this kernel separately. (Defaulf True). Model has a separate parameter for balancing.
-
sde
()[source]¶ Return the state space representation of the covariance.
Note! For Sparse GP inference too small or two high values of lengthscale lead to instabilities. This is because Qc are too high or too low and P_inf are not full rank. This effect depends on approximatio order. For N = 10. lengthscale must be in (0.8,8). For other N tests must be conducted. N=6: (0.06,31) Variance should be within reasonable bounds as well, but its dependence is linear.
The above facts do not take into accout regularization.
-
class
sde_RatQuad
(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]¶ Bases:
GPy.kern.src.stationary.RatQuad
Class provide extra functionality to transfer this covariance function into SDE form.
Rational Quadratic kernel:
\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- lpha} \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
GPy.kern.src.spline module¶
-
class
Spline
(input_dim, variance=1.0, c=1.0, active_dims=None, name='spline')[source]¶ Bases:
GPy.kern.src.kern.Kern
Linear spline kernel. You need to specify 2 parameters: the variance and c. The variance is defined in powers of 10. Thus specifying -2 means 10^-2. The parameter c allows to define the stiffness of the spline fit. A very stiff spline equals linear regression. See https://www.youtube.com/watch?v=50Vgw11qn0o starting at minute 1:17:28 Lit: Wahba, 1990
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.splitKern module¶
A new kernel
-
class
DEtime
(kernel, idx_p, Xp, index_dim=-1, name='DiffGenomeKern')[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
-
class
SplitKern
(kernel, Xp, index_dim=-1, name='SplitKern')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
-
class
SplitKern_cross
(kernel, Xp, name='SplitKern_cross')[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
GPy.kern.src.standard_periodic module¶
The standard periodic kernel which mentioned in:
[1] Gaussian Processes for Machine Learning, C. E. Rasmussen, C. K. I. Williams. The MIT Press, 2005.
[2] Introduction to Gaussian processes. D. J. C. MacKay. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 133-165. Springer, 1998.
-
class
StdPeriodic
(input_dim, variance=1.0, period=None, lengthscale=None, ARD1=False, ARD2=False, active_dims=None, name='std_periodic', useGPU=False)[source]¶ Bases:
GPy.kern.src.kern.Kern
Standart periodic kernel
\[k(x,y) = heta_1 \exp \left[ - \]- rac{1}{2} sum_{i=1}^{input_dim}
- left(
rac{sin( rac{pi}{T_i} (x_i - y_i) )}{l_i} ight)^2 ight] }
param input_dim: the number of input dimensions type input_dim: int param variance: the variance :math:` heta_1` in the formula above type variance: float param period: the vector of periods \(\T_i\). If None then 1.0 is assumed. type period: array or list of the appropriate size (or float if there is only one period parameter) param lengthscale: the vector of lengthscale \(\l_i\). If None then 1.0 is assumed. type lengthscale: array or list of the appropriate size (or float if there is only one lengthscale parameter) param ARD1: Auto Relevance Determination with respect to period. If equal to “False” one single period parameter \(\T_i\) for each dimension is assumed, otherwise there is one lengthscale parameter per dimension. type ARD1: Boolean param ARD2: Auto Relevance Determination with respect to lengthscale. If equal to “False” one single lengthscale parameter \(l_i\) for each dimension is assumed, otherwise there is one lengthscale parameter per dimension. type ARD2: Boolean param active_dims: indices of dimensions which are used in the computation of the kernel type active_dims: array or list of the appropriate size param name: Name of the kernel for output :type String :param useGPU: whether of not use GPU :type Boolean
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
parameters_changed
()[source]¶ This functions deals as a callback for each optimization iteration. If one optimization step was successfull and the parameters this callback function will be called to be able to update any precomputations for the kernel.
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
GPy.kern.src.static module¶
-
class
Bias
(input_dim, variance=1.0, active_dims=None, name='bias')[source]¶ Bases:
GPy.kern.src.static.Static
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
-
class
Fixed
(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='fixed')[source]¶ Bases:
GPy.kern.src.static.Static
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the kernel
-
K
(X, X2)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
class
Precomputed
(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='precomputed')[source]¶ Bases:
GPy.kern.src.static.Fixed
Class for precomputed kernels, indexed by columns in X
Usage example:
import numpy as np from GPy.models import GPClassification from GPy.kern import Precomputed from sklearn.cross_validation import LeaveOneOut
n = 10 d = 100 X = np.arange(n).reshape((n,1)) # column vector of indices y = 2*np.random.binomial(1,0.5,(n,1))-1 X0 = np.random.randn(n,d) k = np.dot(X0,X0.T) kern = Precomputed(1,k) # k is a n x n covariance matrix
cv = LeaveOneOut(n) ypred = y.copy() for train, test in cv:
m = GPClassification(X[train], y[train], kernel=kern) m.optimize() ypred[test] = 2*(m.predict(X[test])[0]>0.5)-1Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the kernel
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
class
Static
(input_dim, variance, active_dims, name)[source]¶ Bases:
GPy.kern.src.kern.Kern
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
gradients_XX
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}
-
gradients_XX_diag
(dL_dKdiag, X, cov=False)[source]¶ The diagonal of the second derivative w.r.t. X and X2
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
-
class
White
(input_dim, variance=1.0, active_dims=None, name='white')[source]¶ Bases:
GPy.kern.src.static.Static
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
-
class
WhiteHeteroscedastic
(input_dim, num_data, variance=1.0, active_dims=None, name='white_hetero')[source]¶ Bases:
GPy.kern.src.static.Static
A heteroscedastic White kernel (nugget/noise). It defines one variance (nugget) per input sample.
Prediction excludes any noise learnt by this Kernel, so be careful using this kernel.
You can plot the errors learnt by this kernel by something similar as: plt.errorbar(m.X, m.Y, yerr=2*np.sqrt(m.kern.white.variance))
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
GPy.kern.src.stationary module¶
-
class
Cosine
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Cosine')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Cosine Covariance function
\[k(r) = \sigma^2 \cos(r)\]
-
class
ExpQuad
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='ExpQuad')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
The Exponentiated quadratic covariance function.
\[k(r) = \sigma^2 \exp(- 0.5 r^2)\]- notes::
- This is exactly the same as the RBF covariance function, but the RBF implementation also has some features for doing variational kernels (the psi-statistics).
-
class
ExpQuadCosine
(input_dim, variance=1.0, lengthscale=None, period=1.0, ARD=False, active_dims=None, name='ExpQuadCosine')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Exponentiated quadratic multiplied by cosine covariance function (spectral mixture kernel).
\[k(r) = \sigma^2 \exp(-2\pi^2r^2)\cos(2\pi r/T)\]
-
class
Exponential
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]¶
-
class
Matern32
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Matern 3/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{\text{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]-
Gram_matrix
(F, F1, F2, lower, upper)[source]¶ Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.
Parameters: - F (np.array) – vector of functions
- F1 (np.array) – vector of derivatives of F
- F2 (np.array) – vector of second derivatives of F
- lower,upper (floats) – boundaries of the input domain
-
-
class
Matern52
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Matern 5/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{5} r + \frac53 r^2) \exp(- \sqrt{5} r)\]-
Gram_matrix
(F, F1, F2, F3, lower, upper)[source]¶ Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.
Parameters: - F (np.array) – vector of functions
- F1 (np.array) – vector of derivatives of F
- F2 (np.array) – vector of second derivatives of F
- F3 (np.array) – vector of third derivatives of F
- lower,upper (floats) – boundaries of the input domain
-
-
class
OU
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='OU')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
OU kernel:
\[k(r) = \sigma^2 \exp(- r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{ ext{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]
-
class
RatQuad
(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Rational Quadratic Kernel
\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- \alpha}\]-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
-
class
Sinc
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Sinc')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Sinc Covariance function
\[k(r) = \sigma^2 \sinc(\pi r)\]
-
class
Stationary
(input_dim, variance, lengthscale, ARD, active_dims, name, useGPU=False)[source]¶ Bases:
GPy.kern.src.kern.Kern
Stationary kernels (covariance functions).
Stationary covariance fucntion depend only on r, where r is defined as
\[r(x, x') = \sqrt{ \sum_{q=1}^Q (x_q - x'_q)^2 }\]The covariance function k(x, x’ can then be written k(r).
In this implementation, r is scaled by the lengthscales parameter(s):
\[r(x, x') = \sqrt{ \sum_{q=1}^Q \frac{(x_q - x'_q)^2}{\ell_q^2} }.\]By default, there’s only one lengthscale: seaprate lengthscales for each dimension can be enables by setting ARD=True.
To implement a stationary covariance function using this class, one need only define the covariance function k(r), and it derivative.
``` def K_of_r(self, r):
return foo- def dK_dr(self, r):
- return bar
The lengthscale(s) and variance parameters are added to the structure automatically.
Thanks to @strongh: In Stationary, a covariance function is defined in GPy as stationary when it depends only on the l2-norm |x_1 - x_2 |. However this is the typical definition of isotropy, while stationarity is usually a bit more relaxed. The more common version of stationarity is that the covariance is a function of x_1 - x_2 (See e.g. R&W first paragraph of section 4.1).
-
K
(X, X2=None)[source]¶ Kernel function applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.
K(X, X2) = K_of_r((X-X2)**2)
-
dK2_drdr_diag
()[source]¶ Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.
-
get_one_dimensional_kernel
(dimensions)[source]¶ Specially intended for the grid regression case For a given covariance kernel, this method returns the corresponding kernel for a single dimension. The resulting values can then be used in the algorithm for reconstructing the full covariance matrix.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ Given the derivative of the objective wrt K (dL_dK), compute the derivative wrt X
-
gradients_XX
(dL_dK, X, X2=None)[source]¶ Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:
returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].
..math:
rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}
- ..returns:
- dL2_dXdX2: [NxMxQxQ] in the cov=True case, or [NxMxQ] in the cov=False case,
- for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None) Thus, we return the second derivative in X2.
-
gradients_XX_diag
(dL_dK_diag, X)[source]¶ Given the derivative of the objective dL_dK, compute the second derivative of K wrt X:
..math:
rac{partial^2 K}{partial Xpartial X}
- ..returns:
- dL2_dXdX: [NxQxQ]
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.
See also update_gradients_full
GPy.kern.src.stationary_cython module¶
GPy.kern.src.symbolic module¶
GPy.kern.src.symmetric module¶
-
class
Symmetric
(base_kernel, transform, symmetry_type='even')[source]¶ Bases:
GPy.kern.src.kern.Kern
Symmetric kernel that models a function with even or odd symmetry:
For even symmetry we have:
\[f(x) = f(Ax)\]we then model the function as:
\[f(x) = g(x) + g(Ax)\]the corresponding kernel is:
\[k(x, x') + k(Ax, x') + k(x, Ax') + k(Ax, Ax')\]For odd symmetry we have:
\[f(x) = -f(Ax)\]it does this by modelling:
\[f(x) = g(x) - g(Ax)\]with kernel
\[k(x, x') - k(Ax, x') - k(x, Ax') + k(Ax, Ax')\]where k(x, x’) is the kernel of g(x)
Parameters: - base_kernel – kernel to make symmetric
- transform – transformation matrix describing symmetry plane, A in equations above
- symmetry_type – ‘odd’ or ‘even’ depending on the symmetry needed
-
K
(X, X2)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
GPy.kern.src.trunclinear module¶
-
class
TruncLinear
(input_dim, variances=None, delta=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.kern.Kern
Truncated Linear kernel
\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]Parameters: - input_dim (int) – the number of input dimensions
- variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
- ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type: kernel object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
()[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
class
TruncLinear_inf
(input_dim, interval, variances=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.kern.Kern
Truncated Linear kernel
\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]Parameters: - input_dim (int) – the number of input dimensions
- variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
- ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type: kernel object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
()[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
GPy.likelihoods package¶
Introduction¶
The likelihood is \(p(y|f,X)\) which is how well we will predict
target values given inputs \(X\) and our latent function \(f\)
(\(y\) without noise). Marginal likelihood \(p(y|X)\), is the
same as likelihood except we marginalize out the model \(f\). The
importance of likelihoods in Gaussian Processes is in determining the
‘best’ values of kernel and noise hyperparamters to relate known,
observed and unobserved data. The purpose of optimizing a model
(e.g. GPy.models.GPRegression
) is to determine the ‘best’
hyperparameters i.e. those that minimize negative log marginal
likelihood.

Most likelihood classes inherit directly from
GPy.likelihoods.likelihood
, although an intermediary class
GPy.likelihoods.mixed_noise.MixedNoise
is used by
GPy.likelihoods.multioutput_likelihood
.
Submodules¶
GPy.likelihoods.bernoulli module¶
-
class
Bernoulli
(gp_link=None)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Bernoulli likelihood
\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]Note
Y takes values in either {-1, 1} or {0, 1}. link function should have the domain [0, 1], e.g. probit (default) or Heaviside
-
d2logpdf_dlink2
(inv_link_f, y, Y_metadata=None)[source]¶ Hessian at y, given inv_link_f, w.r.t inv_link_f the hessian will be 0 unless i == j i.e. second derivative logpdf at y given inverse link of f_i and inverse link of f_j w.r.t inverse link of f_i and inverse link of f_j.
\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{-y_{i}}{\lambda(f)^{2}} - \frac{(1-y_{i})}{(1-\lambda(f))^{2}}\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in bernoulli
Returns: Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)
-
d3logpdf_dlink3
(inv_link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given inverse link of f w.r.t inverse link of f
\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2y_{i}}{\lambda(f)^{3}} - \frac{2(1-y_{i}}{(1-\lambda(f))^{3}}\]Parameters: - inv_link_f (Nx1 array) – latent variables passed through inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in bernoulli
Returns: third derivative of log likelihood evaluated at points inverse_link(f)
Return type: Nx1 array
-
dlogpdf_dlink
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the pdf at y, given inverse link of f w.r.t inverse link of f.
\[\frac{d\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{y_{i}}{\lambda(f_{i})} - \frac{(1 - y_{i})}{(1 - \lambda(f_{i}))}\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in bernoulli
Returns: gradient of log likelihood evaluated at points inverse link of f.
Return type: Nx1 array
-
logpdf_link
(inv_link_f, y, Y_metadata=None)[source]¶ Log Likelihood function given inverse link of f.
\[\ln p(y_{i}|\lambda(f_{i})) = y_{i}\log\lambda(f_{i}) + (1-y_{i})\log (1-f_{i})\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in bernoulli
Returns: log likelihood evaluated at points inverse link of f.
Return type: float
-
moments_match_ep
(Y_i, tau_i, v_i, Y_metadata_i=None)[source]¶ Moments match of the marginal approximation in EP algorithm
Parameters: - i – number of observation (int)
- tau_i – precision of the cavity distribution (float)
- v_i – mean/variance of the cavity distribution (float)
-
pdf_link
(inv_link_f, y, Y_metadata=None)[source]¶ Likelihood function given inverse link of f.
\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in bernoulli
Returns: likelihood evaluated for this point
Return type: float
-
predictive_mean
(mu, variance, Y_metadata=None)[source]¶ Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
-
predictive_quantiles
(mu, var, quantiles, Y_metadata=None)[source]¶ Get the “quantiles” of the binary labels (Bernoulli draws). all the quantiles must be either 0 or 1, since those are the only values the draw can take!
-
predictive_variance
(mu, variance, pred_mean, Y_metadata=None)[source]¶ Approximation to the predictive variance: V(Y_star)
The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
Predictive_mean: output’s predictive mean, if None _predictive_mean function will be called.
-
samples
(gp, Y_metadata=None)[source]¶ Returns a set of samples of observations based on a given value of the latent variable.
Parameters: gp – latent variable
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
variational_expectations
(Y, m, v, gh_points=None, Y_metadata=None)[source]¶ Use Gauss-Hermite Quadrature to compute
E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.
if no gh_points are passed, we construct them using defualt options
-
GPy.likelihoods.binomial module¶
-
class
Binomial
(gp_link=None)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Binomial likelihood
\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]Note
Y takes values in either {-1, 1} or {0, 1}. link function should have the domain [0, 1], e.g. probit (default) or Heaviside
-
d2logpdf_dlink2
(inv_link_f, y, Y_metadata=None)[source]¶ Hessian at y, given inv_link_f, w.r.t inv_link_f the hessian will be 0 unless i == j i.e. second derivative logpdf at y given inverse link of f_i and inverse link of f_j w.r.t inverse link of f_i and inverse link of f_j.
\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{-y_{i}}{\lambda(f)^{2}} - \frac{(N-y_{i})}{(1-\lambda(f))^{2}}\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in binomial
Returns: Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)
-
d3logpdf_dlink3
(inv_link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given inverse link of f w.r.t inverse link of f
\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{2y_{i}}{\lambda(f)^{3}} - \frac{2(N-y_{i})}{(1-\lambda(f))^{3}}\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in binomial
Returns: Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)
-
dlogpdf_dlink
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the pdf at y, given inverse link of f w.r.t inverse link of f.
\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{y_{i}}{\lambda(f)} - \frac{(N-y_{i})}{(1-\lambda(f))}\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata must contain ‘trials’
Returns: gradient of log likelihood evaluated at points inverse link of f.
Return type: Nx1 array
-
logpdf_link
(inv_link_f, y, Y_metadata=None)[source]¶ Log Likelihood function given inverse link of f.
\[\ln p(y_{i}|\lambda(f_{i})) = y_{i}\log\lambda(f_{i}) + (1-y_{i})\log (1-f_{i})\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata must contain ‘trials’
Returns: log likelihood evaluated at points inverse link of f.
Return type: float
-
moments_match_ep
(obs, tau, v, Y_metadata_i=None)[source]¶ Calculation of moments using quadrature :param obs: observed output :param tau: cavity distribution 1st natural parameter (precision) :param v: cavity distribution 2nd natural paramenter (mu*precision)
-
pdf_link
(inv_link_f, y, Y_metadata)[source]¶ Likelihood function given inverse link of f.
\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]Parameters: - inv_link_f (Nx1 array) – latent variables inverse link of f.
- y (Nx1 array) – data
- Y_metadata – Y_metadata must contain ‘trials’
Returns: likelihood evaluated for this point
Return type: float
-
samples
(gp, Y_metadata=None, **kw)[source]¶ Returns a set of samples of observations based on a given value of the latent variable.
Parameters: gp – latent variable
-
variational_expectations
(Y, m, v, gh_points=None, Y_metadata=None)[source]¶ Use Gauss-Hermite Quadrature to compute
E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.
if no gh_points are passed, we construct them using defualt options
-
GPy.likelihoods.exponential module¶
-
class
Exponential
(gp_link=None)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Expoential likelihood Y is expected to take values in {0,1,2,…} —– $$ L(x) = exp(lambda) * lambda**Y_i / Y_i! $$
-
d2logpdf_dlink2
(link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j
\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\frac{1}{\lambda(f_{i})^{2}}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in exponential distribution
Returns: Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d3logpdf_dlink3
(link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given link(f) w.r.t link(f)
\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2}{\lambda(f_{i})^{3}}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in exponential distribution
Returns: third derivative of likelihood evaluated at points f
Return type: Nx1 array
-
dlogpdf_dlink
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log likelihood function at y, given link(f) w.r.t link(f)
\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{1}{\lambda(f)} - y_{i}\]Parameters: - link_f (Nx1 array) – latent variables (f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in exponential distribution
Returns: gradient of likelihood evaluated at points
Return type: Nx1 array
-
logpdf_link
(link_f, y, Y_metadata=None)[source]¶ Log Likelihood Function given link(f)
\[\ln p(y_{i}|\lambda(f_{i})) = \ln \lambda(f_{i}) - y_{i}\lambda(f_{i})\]Parameters: - link_f (Nx1 array) – latent variables (link(f))
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in exponential distribution
Returns: likelihood evaluated for this point
Return type: float
-
pdf_link
(link_f, y, Y_metadata=None)[source]¶ Likelihood function given link(f)
\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})\exp (-y\lambda(f_{i}))\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in exponential distribution
Returns: likelihood evaluated for this point
Return type: float
-
GPy.likelihoods.gamma module¶
-
class
Gamma
(gp_link=None, beta=1.0)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Gamma likelihood
\[\begin{split}p(y_{i}|\lambda(f_{i})) = \frac{\beta^{\alpha_{i}}}{\Gamma(\alpha_{i})}y_{i}^{\alpha_{i}-1}e^{-\beta y_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]-
d2logpdf_dlink2
(link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j
\[\begin{split}\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\beta^{2}\frac{d\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in gamma distribution
Returns: Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d3logpdf_dlink3
(link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given link(f) w.r.t link(f)
\[\begin{split}\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = -\beta^{3}\frac{d^{2}\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in gamma distribution
Returns: third derivative of likelihood evaluated at points f
Return type: Nx1 array
-
dlogpdf_dlink
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log likelihood function at y, given link(f) w.r.t link(f)
\[\begin{split}\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \beta (\log \beta y_{i}) - \Psi(\alpha_{i})\beta\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables (f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in gamma distribution
Returns: gradient of likelihood evaluated at points
Return type: Nx1 array
-
logpdf_link
(link_f, y, Y_metadata=None)[source]¶ Log Likelihood Function given link(f)
\[\begin{split}\ln p(y_{i}|\lambda(f_{i})) = \alpha_{i}\log \beta - \log \Gamma(\alpha_{i}) + (\alpha_{i} - 1)\log y_{i} - \beta y_{i}\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables (link(f))
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: likelihood evaluated for this point
Return type: float
-
pdf_link
(link_f, y, Y_metadata=None)[source]¶ Likelihood function given link(f)
\[\begin{split}p(y_{i}|\lambda(f_{i})) = \frac{\beta^{\alpha_{i}}}{\Gamma(\alpha_{i})}y_{i}^{\alpha_{i}-1}e^{-\beta y_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: likelihood evaluated for this point
Return type: float
-
GPy.likelihoods.gaussian module¶
A lot of this code assumes that the link function is the identity.
I think laplace code is okay, but I’m quite sure that the EP moments will only work if the link is identity.
Furthermore, exact Guassian inference can only be done for the identity link, so we should be asserting so for all calls which relate to that.
James 11/12/13
-
class
Gaussian
(gp_link=None, variance=1.0, name='Gaussian_noise')[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Gaussian likelihood
\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]Parameters: - variance – variance value of the Gaussian distribution
- N (int) – Number of data points
-
d2logpdf_dlink2
(link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link_f, w.r.t link_f. i.e. second derivative logpdf at y given link(f_i) link(f_j) w.r.t link(f_i) and link(f_j)
The hessian will be 0 unless i == j
\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}f} = -\frac{1}{\sigma^{2}}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points link(f))
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d2logpdf_dlink2_dvar
(link_f, y, Y_metadata=None)[source]¶ Gradient of the hessian (d2logpdf_dlink2) w.r.t variance parameter (noise_variance)
\[\frac{d}{d\sigma^{2}}(\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)}) = \frac{1}{\sigma^{4}}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log hessian evaluated at points link(f_i) and link(f_j) w.r.t variance parameter
Return type: Nx1 array
-
d3logpdf_dlink3
(link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given link(f) w.r.t link(f)
\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = 0\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: third derivative of log likelihood evaluated at points link(f)
Return type: Nx1 array
-
dlogpdf_dlink
(link_f, y, Y_metadata=None)[source]¶ Gradient of the pdf at y, given link(f) w.r.t link(f)
\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{1}{\sigma^{2}}(y_{i} - \lambda(f_{i}))\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: gradient of log likelihood evaluated at points link(f)
Return type: Nx1 array
-
dlogpdf_dlink_dvar
(link_f, y, Y_metadata=None)[source]¶ Derivative of the dlogpdf_dlink w.r.t variance parameter (noise_variance)
\[\frac{d}{d\sigma^{2}}(\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)}) = \frac{1}{\sigma^{4}}(-y_{i} + \lambda(f_{i}))\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type: Nx1 array
-
dlogpdf_link_dvar
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log-likelihood function at y given link(f), w.r.t variance parameter (noise_variance)
\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\sigma^{2}} = -\frac{N}{2\sigma^{2}} + \frac{(y_{i} - \lambda(f_{i}))^{2}}{2\sigma^{4}}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type: float
-
ep_gradients
(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]¶
-
logpdf_link
(link_f, y, Y_metadata=None)[source]¶ Log likelihood function given link(f)
\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: log likelihood evaluated for this point
Return type: float
-
moments_match_ep
(data_i, tau_i, v_i, Y_metadata_i=None)[source]¶ Moments match of the marginal approximation in EP algorithm
Parameters: - i – number of observation (int)
- tau_i – precision of the cavity distribution (float)
- v_i – mean/variance of the cavity distribution (float)
-
pdf_link
(link_f, y, Y_metadata=None)[source]¶ Likelihood function given link(f)
\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: likelihood evaluated for this point
Return type: float
-
predictive_mean
(mu, sigma)[source]¶ Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
-
predictive_values
(mu, var, full_cov=False, Y_metadata=None)[source]¶ Compute mean, variance of the predictive distibution.
Parameters: - mu – mean of the latent variable, f, of posterior
- var – variance of the latent variable, f, of posterior
- full_cov (Boolean) – whether to use the full covariance or just the diagonal
-
predictive_variance
(mu, sigma, predictive_mean=None)[source]¶ Approximation to the predictive variance: V(Y_star)
The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
Predictive_mean: output’s predictive mean, if None _predictive_mean function will be called.
-
samples
(gp, Y_metadata=None)[source]¶ Returns a set of samples of observations based on a given value of the latent variable.
Parameters: gp – latent variable
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
variational_expectations
(Y, m, v, gh_points=None, Y_metadata=None)[source]¶ Use Gauss-Hermite Quadrature to compute
E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.
if no gh_points are passed, we construct them using defualt options
-
class
HeteroscedasticGaussian
(Y_metadata, gp_link=None, variance=1.0, name='het_Gauss')[source]¶ Bases:
GPy.likelihoods.gaussian.Gaussian
-
predictive_values
(mu, var, full_cov=False, Y_metadata=None)[source]¶ Compute mean, variance of the predictive distibution.
Parameters: - mu – mean of the latent variable, f, of posterior
- var – variance of the latent variable, f, of posterior
- full_cov (Boolean) – whether to use the full covariance or just the diagonal
-
GPy.likelihoods.likelihood module¶
-
class
Likelihood
(gp_link, name)[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
Likelihood base class, used to defing p(y|f).
All instances use _inverse_ link functions, which can be swapped out. It is expected that inheriting classes define a default inverse link function
To use this class, inherit and define missing functionality.
- Inheriting classes must implement:
- pdf_link : a bound method which turns the output of the link function into the pdf logpdf_link : the logarithm of the above
- To enable use with EP, inheriting classes must define:
- TODO: a suitable derivative function for any parameters of the class
- It is also desirable to define:
- moments_match_ep : a function to compute the EP moments If this isn’t defined, the moments will be computed using 1D quadrature.
- To enable use with Laplace approximation, inheriting classes must define:
- Some derivative functions AS TODO
For exact Gaussian inference, define JH TODO
-
MCMC_pdf_samples
(fNew, num_samples=1000, starting_loc=None, stepsize=0.1, burn_in=1000, Y_metadata=None)[source]¶ Simple implementation of Metropolis sampling algorithm
Will run a parallel chain for each input dimension (treats each f independently) Thus assumes f*_1 independant of f*_2 etc.
Parameters: - num_samples – Number of samples to take
- fNew – f at which to sample around
- starting_loc – Starting locations of the independant chains (usually will be conditional_mean of likelihood), often link_f
- stepsize – Stepsize for the normal proposal distribution (will need modifying)
- burnin – number of samples to use for burnin (will need modifying)
- Y_metadata – Y_metadata for pdf
-
conditional_variance
(gp)[source]¶ The variance of the random variable conditioned on one value of the GP
-
d2logpdf_df2
(*args, **kwargs)¶
-
d3logpdf_df3
(*args, **kwargs)¶
-
dlogpdf_df
(f, y, Y_metadata=None)[source]¶ Evaluates the link function link(f) then computes the derivative of log likelihood using it Uses the Faa di Bruno’s formula for the chain rule
\[\frac{d\log p(y|\lambda(f))}{df} = \frac{d\log p(y|\lambda(f))}{d\lambda(f)}\frac{d\lambda(f)}{df}\]Parameters: - f (Nx1 array) – latent variables f
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns: derivative of log likelihood evaluated for this point
Return type: 1xN array
-
ep_gradients
(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]¶
-
static
from_dict
(input_dict)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
log_predictive_density
(y_test, mu_star, var_star, Y_metadata=None)[source]¶ Calculation of the log predictive density
Parameters: - y_test ((Nx1) array) – test observations (y_{*})
- mu_star ((Nx1) array) – predictive mean of gaussian p(f_{*}|mu_{*}, var_{*})
- var_star ((Nx1) array) – predictive variance of gaussian p(f_{*}|mu_{*}, var_{*})
-
log_predictive_density_sampling
(y_test, mu_star, var_star, Y_metadata=None, num_samples=1000)[source]¶ Calculation of the log predictive density via sampling
Parameters: - y_test ((Nx1) array) – test observations (y_{*})
- mu_star ((Nx1) array) – predictive mean of gaussian p(f_{*}|mu_{*}, var_{*})
- var_star ((Nx1) array) – predictive variance of gaussian p(f_{*}|mu_{*}, var_{*})
- num_samples (int) – num samples of p(f_{*}|mu_{*}, var_{*}) to take
-
logpdf
(f, y, Y_metadata=None)[source]¶ Evaluates the link function link(f) then computes the log likelihood (log pdf) using it
Parameters: - f (Nx1 array) – latent variables f
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns: log likelihood evaluated for this point
Return type: float
-
logpdf_sum
(f, y, Y_metadata=None)[source]¶ Convenience function that can overridden for functions where this could be computed more efficiently
-
moments_match_ep
(obs, tau, v, Y_metadata_i=None)[source]¶ Calculation of moments using quadrature
Parameters: - obs – observed output
- tau – cavity distribution 1st natural parameter (precision)
- v – cavity distribution 2nd natural paramenter (mu*precision)
-
pdf
(f, y, Y_metadata=None)[source]¶ Evaluates the link function link(f) then computes the likelihood (pdf) using it
Parameters: - f (Nx1 array) – latent variables f
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns: likelihood evaluated for this point
Return type: float
-
predictive_mean
(mu, variance, Y_metadata=None)[source]¶ Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
-
predictive_values
(mu, var, full_cov=False, Y_metadata=None)[source]¶ Compute mean, variance of the predictive distibution.
Parameters: - mu – mean of the latent variable, f, of posterior
- var – variance of the latent variable, f, of posterior
- full_cov (Boolean) – whether to use the full covariance or just the diagonal
-
predictive_variance
(mu, variance, predictive_mean=None, Y_metadata=None)[source]¶ Approximation to the predictive variance: V(Y_star)
The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
Predictive_mean: output’s predictive mean, if None _predictive_mean function will be called.
-
request_num_latent_functions
(Y)[source]¶ The likelihood should infer how many latent functions are needed for the likelihood
Default is the number of outputs
-
samples
(gp, Y_metadata=None, samples=1)[source]¶ Returns a set of samples of observations based on a given value of the latent variable.
Parameters: - gp – latent variable
- samples – number of samples to take for each f location
-
variational_expectations
(Y, m, v, gh_points=None, Y_metadata=None)[source]¶ Use Gauss-Hermite Quadrature to compute
E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.
if no gh_points are passed, we construct them using defualt options
GPy.likelihoods.link_functions module¶
-
class
Cloglog
[source]¶ Bases:
GPy.likelihoods.link_functions.GPTransformation
Complementary log-log link .. math:
p(f) = 1 - e^{-e^f} or f = \log (-\log(1-p))
-
class
GPTransformation
[source]¶ Bases:
object
Link function class for doing non-Gaussian likelihoods approximation
Parameters: Y – observed output (Nx1 numpy.darray) Note
Y values allowed depend on the likelihood_function used
-
static
from_dict
(input_dict)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
static
-
class
Heaviside
[source]¶ Bases:
GPy.likelihoods.link_functions.GPTransformation
\[g(f) = I_{x \geq 0}\]
-
class
Identity
[source]¶ Bases:
GPy.likelihoods.link_functions.GPTransformation
\[g(f) = f\]
-
class
Log
[source]¶ Bases:
GPy.likelihoods.link_functions.GPTransformation
\[g(f) = \log(\mu)\]
-
class
Log_ex_1
[source]¶ Bases:
GPy.likelihoods.link_functions.GPTransformation
\[g(f) = \log(\exp(\mu) - 1)\]
-
class
Probit
[source]¶ Bases:
GPy.likelihoods.link_functions.GPTransformation
\[g(f) = \Phi^{-1} (mu)\]
-
class
ScaledProbit
(nu=1.0)[source]¶ Bases:
GPy.likelihoods.link_functions.Probit
\[g(f) = \Phi^{-1} (nu*mu)\]
GPy.likelihoods.loggaussian module¶
-
class
LogGaussian
(gp_link=None, sigma=1.0)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
\[$$ p(y_{i}|f_{i}, z_{i}) = \prod_{i=1}^{n} (\frac{ry^{r-1}}{\exp{f(x_{i})}})^{1-z_i} (1 + (\frac{y}{\exp(f(x_{i}))})^{r})^{z_i-2} $$\]-
d2logpdf_dlink2
(link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j
\[\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d2logpdf_dlink2_dtheta
(f, y, Y_metadata=None)[source]¶ Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type: Nx1 array
-
d2logpdf_dlink2_dvar
(link_f, y, Y_metadata=None)[source]¶ Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type: Nx1 array
-
d3logpdf_dlink3
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log-likelihood function at y given f, w.r.t shape parameter
\[\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: derivative of likelihood evaluated at points f w.r.t variance parameter
Return type: float
-
dlogpdf_dlink
(link_f, y, Y_metadata=None)[source]¶ derivative of logpdf wrt link_f param .. math:
:param link_f: latent variables link(f) :type link_f: Nx1 array :param y: data :type y: Nx1 array :param Y_metadata: includes censoring information in dictionary key 'censored' :returns: likelihood evaluated for this point :rtype: float
-
dlogpdf_dlink_dtheta
(f, y, Y_metadata=None)[source]¶ Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type: Nx1 array
-
dlogpdf_dlink_dvar
(link_f, y, Y_metadata=None)[source]¶ Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type: Nx1 array
-
dlogpdf_link_dtheta
(f, y, Y_metadata=None)[source]¶ Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata not used in gaussian
Returns: derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type: Nx1 array
-
dlogpdf_link_dvar
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log-likelihood function at y given f, w.r.t variance parameter
\[\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: derivative of likelihood evaluated at points f w.r.t variance parameter
Return type: float
-
logpdf_link
(link_f, y, Y_metadata=None)[source]¶ Parameters: - link_f (Nx1 array) – latent variables (link(f))
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: likelihood evaluated for this point
Return type: float
-
pdf_link
(link_f, y, Y_metadata=None)[source]¶ Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: likelihood evaluated for this point
Return type: float
-
GPy.likelihoods.loglogistic module¶
-
class
LogLogistic
(gp_link=None, r=1.0)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
\[$$ p(y_{i}|f_{i}, z_{i}) = \prod_{i=1}^{n} (\frac{ry^{r-1}}{\exp{f(x_{i})}})^{1-z_i} (1 + (\frac{y}{\exp(f(x_{i}))})^{r})^{z_i-2} $$\]-
d2logpdf_dlink2
(link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j
\[\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d2logpdf_dlink2_dr
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the hessian (d2logpdf_dlink2) w.r.t shape parameter
\[\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: derivative of hessian evaluated at points f and f_j w.r.t variance parameter
Return type: Nx1 array
-
d3logpdf_dlink3
(link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given link(f) w.r.t link(f)
\[\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: third derivative of likelihood evaluated at points f
Return type: Nx1 array
-
dlogpdf_dlink
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log likelihood function at y, given link(f) w.r.t link(f)
\[\]Parameters: - link_f (Nx1 array) – latent variables (f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: gradient of likelihood evaluated at points
Return type: Nx1 array
-
dlogpdf_dlink_dr
(inv_link_f, y, Y_metadata=None)[source]¶ Derivative of the dlogpdf_dlink w.r.t shape parameter
\[\]Parameters: - inv_link_f (Nx1 array) – latent variables inv_link_f
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: derivative of likelihood evaluated at points f w.r.t variance parameter
Return type: Nx1 array
-
dlogpdf_link_dr
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the log-likelihood function at y given f, w.r.t shape parameter
\[\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: derivative of likelihood evaluated at points f w.r.t variance parameter
Return type: float
-
logpdf_link
(link_f, y, Y_metadata=None)[source]¶ Log Likelihood Function given link(f)
\[\]Parameters: - link_f (Nx1 array) – latent variables (link(f))
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: likelihood evaluated for this point
Return type: float
-
pdf_link
(link_f, y, Y_metadata=None)[source]¶ Likelihood function given link(f)
\[\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: likelihood evaluated for this point
Return type: float
-
GPy.likelihoods.mixed_noise module¶
-
class
MixedNoise
(likelihoods_list, name='mixed_noise')[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
-
predictive_values
(mu, var, full_cov=False, Y_metadata=None)[source]¶ Compute mean, variance of the predictive distibution.
Parameters: - mu – mean of the latent variable, f, of posterior
- var – variance of the latent variable, f, of posterior
- full_cov (Boolean) – whether to use the full covariance or just the diagonal
-
predictive_variance
(mu, sigma, Y_metadata)[source]¶ Approximation to the predictive variance: V(Y_star)
The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
Predictive_mean: output’s predictive mean, if None _predictive_mean function will be called.
-
samples
(gp, Y_metadata)[source]¶ Returns a set of samples of observations based on a given value of the latent variable.
Parameters: gp – latent variable
-
GPy.likelihoods.multioutput_likelihood module¶
-
class
MultioutputLikelihood
(likelihoods_list, name='multioutput_likelihood')[source]¶ Bases:
GPy.likelihoods.mixed_noise.MixedNoise
CombinedLikelihood is used to combine different likelihoods for multioutput models, where different outputs have different observation models.
As input the likelihood takes a list of likelihoods used. The likelihood uses “output_index” in Y_metadata to connect observations to likelihoods.
-
dlogpdf_df
(f, y, Y_metadata)[source]¶ Evaluates the link function link(f) then computes the derivative of log likelihood using it Uses the Faa di Bruno’s formula for the chain rule
\[\frac{d\log p(y|\lambda(f))}{df} = \frac{d\log p(y|\lambda(f))}{d\lambda(f)}\frac{d\lambda(f)}{df}\]Parameters: - f (Nx1 array) – latent variables f
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns: derivative of log likelihood evaluated for this point
Return type: 1xN array
-
ep_gradients
(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]¶
-
log_predictive_density
(y_test, mu_star, var_star, Y_metadata=None)[source]¶ Calculation of the log predictive density
Parameters: - y_test ((Nx1) array) – test observations (y_{*})
- mu_star ((Nx1) array) – predictive mean of gaussian p(f_{*}|mu_{*}, var_{*})
- var_star ((Nx1) array) – predictive variance of gaussian p(f_{*}|mu_{*}, var_{*})
-
logpdf
(f, y, Y_metadata=None)[source]¶ Evaluates the link function link(f) then computes the log likelihood (log pdf) using it
Parameters: - f (Nx1 array) – latent variables f
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns: log likelihood evaluated for this point
Return type: float
-
moments_match_ep
(data_i, tau_i, v_i, Y_metadata_i)[source]¶ Calculation of moments using quadrature
Parameters: - obs – observed output
- tau – cavity distribution 1st natural parameter (precision)
- v – cavity distribution 2nd natural paramenter (mu*precision)
-
pdf
(f, y, Y_metadata=None)[source]¶ Evaluates the link function link(f) then computes the likelihood (pdf) using it
Parameters: - f (Nx1 array) – latent variables f
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns: likelihood evaluated for this point
Return type: float
-
predictive_values
(mu, var, full_cov=False, Y_metadata=None)[source]¶ Compute mean, variance of the predictive distibution.
Parameters: - mu – mean of the latent variable, f, of posterior
- var – variance of the latent variable, f, of posterior
- full_cov (Boolean) – whether to use the full covariance or just the diagonal
-
predictive_variance
(mu, sigma, Y_metadata)[source]¶ Approximation to the predictive variance: V(Y_star)
The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
Predictive_mean: output’s predictive mean, if None _predictive_mean function will be called.
-
GPy.likelihoods.poisson module¶
-
class
Poisson
(gp_link=None)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Poisson likelihood
\[p(y_{i}|\lambda(f_{i})) = \frac{\lambda(f_{i})^{y_{i}}}{y_{i}!}e^{-\lambda(f_{i})}\]Note
Y is expected to take values in {0,1,2,…}
-
conditional_variance
(gp)[source]¶ The variance of the random variable conditioned on one value of the GP
-
d2logpdf_dlink2
(link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j
\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = \frac{-y_{i}}{\lambda(f_{i})^{2}}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d3logpdf_dlink3
(link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given link(f) w.r.t link(f)
\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2y_{i}}{\lambda(f_{i})^{3}}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: third derivative of likelihood evaluated at points f
Return type: Nx1 array
-
dlogpdf_dlink
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log likelihood function at y, given link(f) w.r.t link(f)
\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{y_{i}}{\lambda(f_{i})} - 1\]Parameters: - link_f (Nx1 array) – latent variables (f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: gradient of likelihood evaluated at points
Return type: Nx1 array
-
logpdf_link
(link_f, y, Y_metadata=None)[source]¶ Log Likelihood Function given link(f)
\[\ln p(y_{i}|\lambda(f_{i})) = -\lambda(f_{i}) + y_{i}\log \lambda(f_{i}) - \log y_{i}!\]Parameters: - link_f (Nx1 array) – latent variables (link(f))
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: likelihood evaluated for this point
Return type: float
-
pdf_link
(link_f, y, Y_metadata=None)[source]¶ Likelihood function given link(f)
\[p(y_{i}|\lambda(f_{i})) = \frac{\lambda(f_{i})^{y_{i}}}{y_{i}!}e^{-\lambda(f_{i})}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: likelihood evaluated for this point
Return type: float
-
GPy.likelihoods.student_t module¶
-
class
StudentT
(gp_link=None, deg_free=5, sigma2=2)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Student T likelihood
For nomanclature see Bayesian Data Analysis 2003 p576
\[p(y_{i}|\lambda(f_{i})) = \frac{\Gamma\left(\frac{v+1}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{v\pi\sigma^{2}}}\left(1 + \frac{1}{v}\left(\frac{(y_{i} - f_{i})^{2}}{\sigma^{2}}\right)\right)^{\frac{-v+1}{2}}\]-
conditional_variance
(gp)[source]¶ The variance of the random variable conditioned on one value of the GP
-
d2logpdf_dlink2
(inv_link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j
\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = \frac{(v+1)((y_{i}-\lambda(f_{i}))^{2} - \sigma^{2}v)}{((y_{i}-\lambda(f_{i}))^{2} + \sigma^{2}v)^{2}}\]Parameters: - inv_link_f (Nx1 array) – latent variables inv_link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d2logpdf_dlink2_dvar
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the hessian (d2logpdf_dlink2) w.r.t variance parameter (t_noise)
\[\frac{d}{d\sigma^{2}}(\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}f}) = \frac{v(v+1)(\sigma^{2}v - 3(y_{i} - \lambda(f_{i}))^{2})}{(\sigma^{2}v + (y_{i} - \lambda(f_{i}))^{2})^{3}}\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: derivative of hessian evaluated at points f and f_j w.r.t variance parameter
Return type: Nx1 array
-
d3logpdf_dlink3
(inv_link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given link(f) w.r.t link(f)
\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{-2(v+1)((y_{i} - \lambda(f_{i}))^3 - 3(y_{i} - \lambda(f_{i})) \sigma^{2} v))}{((y_{i} - \lambda(f_{i})) + \sigma^{2} v)^3}\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: third derivative of likelihood evaluated at points f
Return type: Nx1 array
-
dlogpdf_dlink
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the log likelihood function at y, given link(f) w.r.t link(f)
\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{(v+1)(y_{i}-\lambda(f_{i}))}{(y_{i}-\lambda(f_{i}))^{2} + \sigma^{2}v}\]Parameters: - inv_link_f (Nx1 array) – latent variables (f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: gradient of likelihood evaluated at points
Return type: Nx1 array
-
dlogpdf_dlink_dvar
(inv_link_f, y, Y_metadata=None)[source]¶ Derivative of the dlogpdf_dlink w.r.t variance parameter (t_noise)
\[\frac{d}{d\sigma^{2}}(\frac{d \ln p(y_{i}|\lambda(f_{i}))}{df}) = \frac{-2\sigma v(v + 1)(y_{i}-\lambda(f_{i}))}{(y_{i}-\lambda(f_{i}))^2 + \sigma^2 v)^2}\]Parameters: - inv_link_f (Nx1 array) – latent variables inv_link_f
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: derivative of likelihood evaluated at points f w.r.t variance parameter
Return type: Nx1 array
-
dlogpdf_link_dvar
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the log-likelihood function at y given f, w.r.t variance parameter (t_noise)
\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\sigma^{2}} = \frac{v((y_{i} - \lambda(f_{i}))^{2} - \sigma^{2})}{2\sigma^{2}(\sigma^{2}v + (y_{i} - \lambda(f_{i}))^{2})}\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: derivative of likelihood evaluated at points f w.r.t variance parameter
Return type: float
-
logpdf_link
(inv_link_f, y, Y_metadata=None)[source]¶ Log Likelihood Function given link(f)
\[\ln p(y_{i}|\lambda(f_{i})) = \ln \Gamma\left(\frac{v+1}{2}\right) - \ln \Gamma\left(\frac{v}{2}\right) - \ln \sqrt{v \pi\sigma^{2}} - \frac{v+1}{2}\ln \left(1 + \frac{1}{v}\left(\frac{(y_{i} - \lambda(f_{i}))^{2}}{\sigma^{2}}\right)\right)\]Parameters: - inv_link_f (Nx1 array) – latent variables (link(f))
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: likelihood evaluated for this point
Return type: float
-
pdf_link
(inv_link_f, y, Y_metadata=None)[source]¶ Likelihood function given link(f)
\[p(y_{i}|\lambda(f_{i})) = \frac{\Gamma\left(\frac{v+1}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{v\pi\sigma^{2}}}\left(1 + \frac{1}{v}\left(\frac{(y_{i} - \lambda(f_{i}))^{2}}{\sigma^{2}}\right)\right)^{\frac{-v+1}{2}}\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in student t distribution
Returns: likelihood evaluated for this point
Return type: float
-
predictive_mean
(mu, sigma, Y_metadata=None)[source]¶ Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
-
predictive_variance
(mu, variance, predictive_mean=None, Y_metadata=None)[source]¶ Approximation to the predictive variance: V(Y_star)
The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2
Parameters: - mu – mean of posterior
- sigma – standard deviation of posterior
Predictive_mean: output’s predictive mean, if None _predictive_mean function will be called.
-
GPy.likelihoods.weibull module¶
-
class
Weibull
(gp_link=None, beta=1.0)[source]¶ Bases:
GPy.likelihoods.likelihood.Likelihood
Implementing Weibull likelihood function …
-
d2logpdf_dlink2
(link_f, y, Y_metadata=None)[source]¶ Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j
\[\begin{split}\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\beta^{2}\frac{d\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in gamma distribution
Returns: Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type: Nx1 array
Note
Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))
-
d2logpdf_dlink2_dr
(link_f, y, Y_metadata=None)[source]¶ Derivative of hessian of loglikelihood wrt r-shape parameter. :param link_f: :param y: :param Y_metadata: :return:
-
d3logpdf_dlink3
(link_f, y, Y_metadata=None)[source]¶ Third order derivative log-likelihood function at y given link(f) w.r.t link(f)
\[\begin{split}\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = -\beta^{3}\frac{d^{2}\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in gamma distribution
Returns: third derivative of likelihood evaluated at points f
Return type: Nx1 array
-
d3logpdf_dlink3_dr
(link_f, y, Y_metadata=None)[source]¶ Parameters: - link_f –
- y –
- Y_metadata –
Returns:
-
dlogpdf_dlink
(link_f, y, Y_metadata=None)[source]¶ Gradient of the log likelihood function at y, given link(f) w.r.t link(f)
\[\begin{split}\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \beta (\log \beta y_{i}) - \Psi(\alpha_{i})\beta\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables (f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in gamma distribution
Returns: gradient of likelihood evaluated at points
Return type: Nx1 array
-
dlogpdf_dlink_dr
(inv_link_f, y, Y_metadata=None)[source]¶ First order derivative derivative of loglikelihood wrt r:shape parameter
Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in gamma distribution
Returns: third derivative of likelihood evaluated at points f
Return type: Nx1 array
-
dlogpdf_link_dr
(inv_link_f, y, Y_metadata=None)[source]¶ Gradient of the log-likelihood function at y given f, w.r.t shape parameter
\[\]Parameters: - inv_link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – includes censoring information in dictionary key ‘censored’
Returns: derivative of likelihood evaluated at points f w.r.t variance parameter
Return type: float
-
logpdf_link
(link_f, y, Y_metadata=None)[source]¶ Log Likelihood Function given link(f)
\[\begin{split}\ln p(y_{i}|\lambda(f_{i})) = \alpha_{i}\log \beta - \log \Gamma(\alpha_{i}) + (\alpha_{i} - 1)\log y_{i} - \beta y_{i}\\ \alpha_{i} = \beta y_{i}\end{split}\]Parameters: - link_f (Nx1 array) – latent variables (link(f))
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in poisson distribution
Returns: likelihood evaluated for this point
Return type: float
-
pdf_link
(link_f, y, Y_metadata=None)[source]¶ Likelihood function given link(f)
Parameters: - link_f (Nx1 array) – latent variables link(f)
- y (Nx1 array) – data
- Y_metadata – Y_metadata which is not used in weibull distribution
Returns: likelihood evaluated for this point
Return type: float
-
GPy.mappings package¶
Submodules¶
GPy.mappings.additive module¶
-
class
Additive
(mapping1, mapping2)[source]¶ Bases:
GPy.core.mapping.Mapping
Mapping based on adding two existing mappings together.
\[f(\mathbf{x}*) = f_1(\mathbf{x}*) + f_2(\mathbf(x)*)\]Parameters: - mapping1 (GPy.mappings.Mapping) – first mapping to add together.
- mapping2 (GPy.mappings.Mapping) – second mapping to add together.
GPy.mappings.compound module¶
-
class
Compound
(mapping1, mapping2)[source]¶ Bases:
GPy.core.mapping.Mapping
Mapping based on passing one mapping through another
\[f(\mathbf{x}) = f_2(f_1(\mathbf{x}))\]Parameters: - mapping1 (GPy.mappings.Mapping) – first mapping
- mapping2 (GPy.mappings.Mapping) – second mapping
GPy.mappings.constant module¶
-
class
Constant
(input_dim, output_dim, value=0.0, name='constmap')[source]¶ Bases:
GPy.core.mapping.Mapping
A Linear mapping.
\[F(\mathbf{x}) = c\]Parameters: - input_dim (int) – dimension of input.
- output_dim (int) – dimension of output.
Param: value the value of this constant mapping
GPy.mappings.identity module¶
-
class
Identity
(input_dim, output_dim, name='identity')[source]¶ Bases:
GPy.core.mapping.Mapping
A mapping that does nothing!
GPy.mappings.kernel module¶
-
class
Kernel
(input_dim, output_dim, Z, kernel, name='kernmap')[source]¶ Bases:
GPy.core.mapping.Mapping
Mapping based on a kernel/covariance function.
\[f(\mathbf{x}) = \sum_i lpha_i k(\mathbf{z}_i, \mathbf{x})\]or for multple outputs
\[f_i(\mathbf{x}) = \sum_j lpha_{i,j} k(\mathbf{z}_i, \mathbf{x})\]Parameters: - input_dim (int) – dimension of input.
- output_dim (int) – dimension of output.
- Z (ndarray) – input observations containing \(\mathbf{Z}\)
- kernel (GPy.kern.kern) – a GPy kernel, defaults to GPy.kern.RBF
GPy.mappings.linear module¶
-
class
Linear
(input_dim, output_dim, name='linmap')[source]¶ Bases:
GPy.core.mapping.Mapping
A Linear mapping.
\[F(\mathbf{x}) = \mathbf{A} \mathbf{x})\]Parameters: - input_dim (int) – dimension of input.
- output_dim (int) – dimension of output.
- kernel (GPy.kern.kern) – a GPy kernel, defaults to GPy.kern.RBF
GPy.mappings.mlp module¶
GPy.mappings.mlpext module¶
-
class
MLPext
(input_dim=1, output_dim=1, hidden_dims=[3], prior=None, activation='tanh', name='mlpmap')[source]¶ Bases:
GPy.core.mapping.Mapping
Mapping based on a multi-layer perceptron neural network model, with multiple hidden layers. Activation function is applied to all hidden layers. The output is a linear combination of the last layer features, i.e. the last layer is linear.
Parameters: - input_dim – number of input dimensions
- output_dim – number of output dimensions
- hidden_dims – list of hidden sizes of hidden layers
- prior – variance of Gaussian prior on all variables. If None, no prior is used (default: None)
- activation – choose activation function. Allowed values are ‘tanh’ and ‘sigmoid’
- name –
GPy.mappings.piecewise_linear module¶
-
class
PiecewiseLinear
(input_dim, output_dim, values, breaks, name='piecewise_linear')[source]¶ Bases:
GPy.core.mapping.Mapping
A piecewise-linear mapping.
The parameters of this mapping are the positions and values of the function where it is broken (self.breaks, self.values).
Outside the range of the breaks, the function is assumed to have gradient 1
GPy.examples package¶
Introduction¶
The examples in this package usually depend on pods so make sure you have that installed before running examples. The easiest way to do this is to run pip install pods. pods enables access to 3rd party data required for most of the examples.
The examples are executable and self-contained workflows in that they have their own source data, create their own models, kernels and other objects as needed, execute optimisation as required, and display output.
Viewing the source code of each model will clarify the steps taken in its execution, and may provide inspiration for developing of user-specific applications of GPy.
Submodules¶
GPy.examples.classification module¶
Gaussian Processes classification examples
-
crescent_data
(model_type='Full', num_inducing=10, seed=10000, kernel=None, optimize=True, plot=True)[source]¶ Run a Gaussian process classification on the crescent data. The demonstration calls the basic GP classification model and uses EP to approximate the likelihood.
Parameters: - model_type – type of model to fit [‘Full’, ‘FITC’, ‘DTC’].
- inducing (int) – number of inducing variables (only used for ‘FITC’ or ‘DTC’).
- seed (int) – seed value for data generation.
- kernel (a GPy kernel) – kernel to use in the model
-
oil
(num_inducing=50, max_iters=100, kernel=None, optimize=True, plot=True)[source]¶ Run a Gaussian process classification on the three phase oil data. The demonstration calls the basic GP classification model and uses EP to approximate the likelihood.
-
sparse_toy_linear_1d_classification
(num_inducing=10, seed=10000, optimize=True, plot=True)[source]¶ Sparse 1D classification example
Parameters: seed (int) – seed value for data generation (default is 4).
-
sparse_toy_linear_1d_classification_uncertain_input
(num_inducing=10, seed=10000, optimize=True, plot=True)[source]¶ Sparse 1D classification example
Parameters: seed (int) – seed value for data generation (default is 4).
-
toy_heaviside
(seed=10000, max_iters=100, optimize=True, plot=True)[source]¶ Simple 1D classification example using a heavy side gp transformation
Parameters: seed (int) – seed value for data generation (default is 4).
GPy.examples.dimensionality_reduction module¶
-
bgplvm_oil
(optimize=True, verbose=1, plot=True, N=200, Q=7, num_inducing=40, max_iters=1000, **k)[source]¶
-
bgplvm_simulation_missing_data
(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0, percent_missing=0.1, d=13)[source]¶
-
bgplvm_simulation_missing_data_stochastics
(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0, percent_missing=0.1, d=13, batchsize=2)[source]¶
-
bgplvm_test_model
(optimize=False, verbose=1, plot=False, output_dim=200, nan=False)[source]¶ model for testing purposes. Samples from a GP with rbf kernel and learns the samples with a new kernel. Normally not for optimization, just model cheking
-
cmu_mocap
(subject='35', motion=['01'], in_place=True, optimize=True, verbose=True, plot=True)[source]¶
-
sparse_gplvm_oil
(optimize=True, verbose=0, plot=True, N=100, Q=6, num_inducing=15, max_iters=50)[source]¶
-
ssgplvm_oil
(optimize=True, verbose=1, plot=True, N=200, Q=7, num_inducing=40, max_iters=1000, **k)[source]¶
-
ssgplvm_simulation
(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0, useGPU=False)[source]¶
GPy.examples.non_gaussian module¶
GPy.examples.regression module¶
Gaussian Processes regression examples
-
coregionalization_sparse
(optimize=True, plot=True)[source]¶ A simple demonstration of coregionalization on two sinusoidal functions using sparse approximations.
-
coregionalization_toy
(optimize=True, plot=True)[source]¶ A simple demonstration of coregionalization on two sinusoidal functions.
-
epomeo_gpx
(max_iters=200, optimize=True, plot=True)[source]¶ Perform Gaussian process regression on the latitude and longitude data from the Mount Epomeo runs. Requires gpxpy to be installed on your system to load in the data.
-
multiple_optima
(gene_number=937, resolution=80, model_restarts=10, seed=10000, max_iters=300, optimize=True, plot=True)[source]¶ Show an example of a multimodal error surface for Gaussian process regression. Gene 939 has bimodal behaviour where the noisy mode is higher.
-
olympic_100m_men
(optimize=True, plot=True)[source]¶ Run a standard Gaussian process regression on the Rogers and Girolami olympics data.
-
olympic_marathon_men
(optimize=True, plot=True)[source]¶ Run a standard Gaussian process regression on the Olympic marathon data.
-
parametric_mean_function
(max_iters=100, optimize=True, plot=True)[source]¶ A linear mean function with parameters that we’ll learn alongside the kernel
-
robot_wireless
(max_iters=100, kernel=None, optimize=True, plot=True)[source]¶ Predict the location of a robot given wirelss signal strength readings.
-
silhouette
(max_iters=100, optimize=True, plot=True)[source]¶ Predict the pose of a figure given a silhouette. This is a task from Agarwal and Triggs 2004 ICML paper.
-
simple_mean_function
(max_iters=100, optimize=True, plot=True)[source]¶ The simplest possible mean function. No parameters, just a simple Sinusoid.
-
sparse_GP_regression_1D
(num_samples=400, num_inducing=5, max_iters=100, optimize=True, plot=True, checkgrad=False)[source]¶ Run a 1D example of a sparse GP regression.
-
sparse_GP_regression_2D
(num_samples=400, num_inducing=50, max_iters=100, optimize=True, plot=True, nan=False)[source]¶ Run a 2D example of a sparse GP regression.
-
toy_ARD
(max_iters=1000, kernel_type='linear', num_samples=300, D=4, optimize=True, plot=True)[source]¶
-
toy_ARD_sparse
(max_iters=1000, kernel_type='linear', num_samples=300, D=4, optimize=True, plot=True)[source]¶
-
toy_poisson_rbf_1d_laplace
(optimize=True, plot=True)[source]¶ Run a simple demonstration of a standard Gaussian process fitting it to data sampled from an RBF covariance.
-
toy_rbf_1d
(optimize=True, plot=True)[source]¶ Run a simple demonstration of a standard Gaussian process fitting it to data sampled from an RBF covariance.
-
toy_rbf_1d_50
(optimize=True, plot=True)[source]¶ Run a simple demonstration of a standard Gaussian process fitting it to data sampled from an RBF covariance.
GPy.util package¶
Introduction¶
A variety of utility functions including matrix operations and quick access to test datasets.
Submodules¶
GPy.util.block_matrices module¶
-
block_dot
(A, B, diagonal=False)[source]¶ Element wise dot product on block matricies
+——+——+ +——+——+ +——-+——-+ | | | | | | |A11.B11|B12.B12| | A11 | A12 | | B11 | B12 | | | | +——+——+ o +——+——| = +——-+——-+ | | | | | | |A21.B21|A22.B22| | A21 | A22 | | B21 | B22 | | | | +————-+ +——+——+ +——-+——-+
- ..Note
If any block of either (A or B) are stored as 1d vectors then we assume that it denotes a diagonal matrix efficient dot product using numpy broadcasting will be used, i.e. A11*B11
If either (A or B) of the diagonal matrices are stored as vectors then a more efficient dot product using numpy broadcasting will be used, i.e. A11*B11
GPy.util.choleskies module¶
-
backprop_gradient
(dL, L)¶ Given the derivative of an objective fn with respect to the cholesky L, compute the derivate with respect to the original matrix K, defined as
K = LL^Twhere L was obtained by Cholesky decomposition
-
flat_to_triang
(flat_mat)¶
-
indexes_to_fix_for_low_rank
(rank, size)[source]¶ Work out which indexes of the flatteneed array should be fixed if we want the cholesky to represent a low rank matrix
-
triang_to_flat
(L)¶
GPy.util.choleskies_cython module¶
GPy.util.classification module¶
-
conf_matrix
(p, labels, names=['1', '0'], threshold=0.5, show=True)[source]¶ Returns error rate and true/false positives in a binary classification problem - Actual classes are displayed by column. - Predicted classes are displayed by row.
Parameters: - p – array of class ‘1’ probabilities.
- labels – array of actual classes.
- names – list of class names, defaults to [‘1’,’0’].
- threshold – probability value used to decide the class.
- show (False|True) – whether the matrix should be shown or not
GPy.util.cluster_with_offset module¶
-
cluster
(data, inputs, verbose=False)[source]¶ Clusters data
Using the new offset model, this method uses a greedy algorithm to cluster the data. It starts with all the data points in separate clusters and tests whether combining them increases the overall log-likelihood (LL). It then iteratively joins pairs of clusters which cause the greatest increase in the LL, until no join increases the LL.
arguments: inputs – the ‘X’s in a list, one item per cluster data – the ‘Y’s in a list, one item per cluster
returns a list of the clusters.
-
get_log_likelihood
(inputs, data, clust)[source]¶ Get the LL of a combined set of clusters, ignoring time series offsets.
Get the log likelihood of a cluster without worrying about the fact different time series are offset. We’re using it here really for those cases in which we only have one cluster to get the loglikelihood of.
arguments: inputs – the ‘X’s in a list, one item per cluster data – the ‘Y’s in a list, one item per cluster clust – list of clusters to use
returns a tuple: log likelihood and the offset (which is always zero for this model)
-
get_log_likelihood_offset
(inputs, data, clust)[source]¶ Get the log likelihood of a combined set of clusters, fitting the offsets
arguments: inputs – the ‘X’s in a list, one item per cluster data – the ‘Y’s in a list, one item per cluster clust – list of clusters to use
returns a tuple: log likelihood and the offset
GPy.util.config module¶
GPy.util.datasets module¶
Check with the user that the are happy with terms and conditions for the data set.
-
cifar10_patches
(data_set='cifar-10')[source]¶ The Candian Institute for Advanced Research 10 image data set. Code for loading in this data is taken from this Boris Babenko’s blog post, original code available here: http://bbabenko.tumblr.com/post/86756017649/learning-low-level-vision-feautres-in-10-lines-of-code
-
cmu_mocap
(subject, train_motions, test_motions=[], sample_every=4, data_set='cmu_mocap')[source]¶ Load a given subject’s training and test motions from the CMU motion capture data.
-
cmu_mocap_35_walk_jog
(data_set='cmu_mocap')[source]¶ Load CMU subject 35’s walking and jogging motions, the same data that was used by Taylor, Roweis and Hinton at NIPS 2007. but without their preprocessing. Also used by Lawrence at AISTATS 2007.
-
cmu_mocap_49_balance
(data_set='cmu_mocap')[source]¶ Load CMU subject 49’s one legged balancing motion that was used by Alvarez, Luengo and Lawrence at AISTATS 2009.
-
cmu_urls_files
(subj_motions, messages=True)[source]¶ Find which resources are missing on the local disk for the requested CMU motion capture motions.
-
crescent_data
(num_data=200, seed=10000)[source]¶ Data set formed from a mixture of four Gaussians. In each class two of the Gaussians are elongated at right angles to each other and offset to form an approximation to the crescent data that is popular in semi-supervised learning as a toy problem.
param num_data_part: number of data to be sampled (default is 200). type num_data: int param seed: random seed to be used for data generation. type seed: int
-
data_available
(dataset_name=None)[source]¶ Check if the data set is available on the local machine already.
-
data_details_return
(data, data_set)[source]¶ Update the data component of the data dictionary with details drawn from the data_resources.
-
decampos_digits
(data_set='decampos_characters', which_digits=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9])[source]¶
-
download_data
(dataset_name=None)[source]¶ Check with the user that the are happy with terms and conditions for the data set, then download it.
-
download_url
(url, store_directory, save_name=None, messages=True, suffix='')[source]¶ Download a file from a url and save it to disk.
-
football_data
(season='1314', data_set='football_data')[source]¶ Football data from English games since 1993. This downloads data from football-data.co.uk for the given season.
-
global_average_temperature
(data_set='global_temperature', num_train=1000, refresh_data=False)[source]¶
-
google_trends
(query_terms=['big data', 'machine learning', 'data science'], data_set='google_trends', refresh_data=False)[source]¶ Data downloaded from Google trends for given query terms.
Warning, if you use this function multiple times in a row you get blocked due to terms of service violations. The function will cache the result of your query, if you wish to refresh an old query set refresh_data to True.
The function is inspired by this notebook: http://nbviewer.ipython.org/github/sahuguet/notebooks/blob/master/GoogleTrends%20meet%20Notebook.ipynb
-
hapmap3
(data_set='hapmap3')[source]¶ The HapMap phase three SNP dataset - 1184 samples out of 11 populations.
SNP_matrix (A) encoding [see Paschou et all. 2007 (PCA-Correlated SNPs…)]: Let (B1,B2) be the alphabetically sorted bases, which occur in the j-th SNP, then
/ 1, iff SNPij==(B1,B1)- Aij = | 0, iff SNPij==(B1,B2)
- -1, iff SNPij==(B2,B2)
The SNP data and the meta information (such as iid, sex and phenotype) are stored in the dataframe datadf, index is the Individual ID, with following columns for metainfo:
- family_id -> Family ID
- paternal_id -> Paternal ID
- maternal_id -> Maternal ID
- sex -> Sex (1=male; 2=female; other=unknown)
- phenotype -> Phenotype (-9, or 0 for unknown)
- population -> Population string (e.g. ‘ASW’ - ‘YRI’)
- rest are SNP rs (ids)
More information is given in infodf:
- Chromosome:
- autosomal chromosemes -> 1-22
- X X chromosome -> 23
- Y Y chromosome -> 24
- XY Pseudo-autosomal region of X -> 25
- MT Mitochondrial -> 26
- Relative Positon (to Chromosome) [base pairs]
-
oil
(data_set='three_phase_oil_flow')[source]¶ The three phase oil data from Bishop and James (1993).
-
olympic_sprints
(data_set='rogers_girolami_data')[source]¶ All olympics sprint winning times for multiple output prediction.
-
toy_rbf_1d
(seed=10000, num_samples=500)[source]¶ Samples values of a function from an RBF covariance with very small noise for inputs uniformly distributed between -1 and 1.
Parameters: - seed (int) – seed to use for random sampling.
- num_samples (int) – number of samples to sample in the function (default 500).
GPy.util.debug module¶
The module for some general debug tools
GPy.util.decorators module¶
GPy.util.diag module¶
-
add
(A, b, offset=0)[source]¶ Add b to the view of A in place (!). Returns modified A. Broadcasting is allowed, thus b can be scalar.
if offset is not zero, make sure b is of right shape!
Parameters: - A (ndarray) – 2 dimensional array
- b (ndarray-like) – either one dimensional or scalar
- offset (int) – same as in view.
Return type: view of A, which is adjusted inplace
-
divide
(A, b, offset=0)[source]¶ Divide the view of A by b in place (!). Returns modified A Broadcasting is allowed, thus b can be scalar.
if offset is not zero, make sure b is of right shape!
Parameters: - A (ndarray) – 2 dimensional array
- b (ndarray-like) – either one dimensional or scalar
- offset (int) – same as in view.
Return type: view of A, which is adjusted inplace
-
multiply
(A, b, offset=0)¶ Times the view of A with b in place (!). Returns modified A Broadcasting is allowed, thus b can be scalar.
if offset is not zero, make sure b is of right shape!
Parameters: - A (ndarray) – 2 dimensional array
- b (ndarray-like) – either one dimensional or scalar
- offset (int) – same as in view.
Return type: view of A, which is adjusted inplace
-
subtract
(A, b, offset=0)[source]¶ Subtract b from the view of A in place (!). Returns modified A. Broadcasting is allowed, thus b can be scalar.
if offset is not zero, make sure b is of right shape!
Parameters: - A (ndarray) – 2 dimensional array
- b (ndarray-like) – either one dimensional or scalar
- offset (int) – same as in view.
Return type: view of A, which is adjusted inplace
-
times
(A, b, offset=0)[source]¶ Times the view of A with b in place (!). Returns modified A Broadcasting is allowed, thus b can be scalar.
if offset is not zero, make sure b is of right shape!
Parameters: - A (ndarray) – 2 dimensional array
- b (ndarray-like) – either one dimensional or scalar
- offset (int) – same as in view.
Return type: view of A, which is adjusted inplace
-
view
(A, offset=0)[source]¶ Get a view on the diagonal elements of a 2D array.
This is actually a view (!) on the diagonal of the array, so you can in-place adjust the view.
:param
ndarray
A: 2 dimensional numpy array :param int offset: view offset to give back (negative entries allowed) :rtype:ndarray
view of diag(A)>>> import numpy as np >>> X = np.arange(9).reshape(3,3) >>> view(X) array([0, 4, 8]) >>> d = view(X) >>> d += 2 >>> view(X) array([ 2, 6, 10]) >>> view(X, offset=-1) array([3, 7]) >>> subtract(X, 3, offset=-1) array([[ 2, 1, 2], [ 0, 6, 5], [ 6, 4, 10]])
GPy.util.functions module¶
GPy.util.gpu_init module¶
The package for scikits.cuda initialization
Global variables: initSuccess providing CUBLAS handle: cublas_handle
GPy.util.input_warping_functions module¶
-
class
IdentifyWarping
[source]¶ Bases:
GPy.util.input_warping_functions.InputWarpingFunction
The identity warping function, for testing
-
class
InputWarpingFunction
(name)[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
Abstract class for input warping functions
-
class
InputWarpingTest
[source]¶ Bases:
GPy.util.input_warping_functions.InputWarpingFunction
The identity warping function, for testing
-
class
KumarWarping
(X, warping_indices=None, epsilon=None, Xmin=None, Xmax=None)[source]¶ Bases:
GPy.util.input_warping_functions.InputWarpingFunction
Kumar Warping for input data
- X : array_like, shape = (n_samples, n_features)
- The input data that is going to be warped
- warping_indices: list of int, optional
- The features that are going to be warped Default to warp all the features
- epsilon: float, optional
- Used to normalized input data to [0+e, 1-e] Default to 1e-6
- Xmin : list of float, Optional
- The min values for each feature defined by users Default to the train minimum
- Xmax : list of float, Optional
- The max values for each feature defined by users Default to the train maximum
- warping_indices: list of int
- The features that are going to be warped Default to warp all the features
- warping_dim: int
- The number of features to be warped
- Xmin : list of float
- The min values for each feature defined by users Default to the train minimum
- Xmax : list of float
- The max values for each feature defined by users Default to the train maximum
- epsilon: float
- Used to normalized input data to [0+e, 1-e] Default to 1e-6
- X_normalized : array_like, shape = (n_samples, n_features)
- The normalized training X
- scaling : list of float, length = n_features in X
- Defined as 1.0 / (self.Xmax - self.Xmin)
- params : list of Param
- The list of all the parameters used in Kumar Warping
- num_parameters: int
- The number of parameters used in Kumar Warping
-
f
(X, test_data=False)[source]¶ Apply warping_function to some Input data
X : array_like, shape = (n_samples, n_features)
- test_data: bool, optional
- Default to False, should set to True when transforming test data
- X_warped : array_like, shape = (n_samples, n_features)
- The warped input data
f(x) = 1 - (1 - x^a)^b
-
fgrad_X
(X)[source]¶ Compute the gradient of warping function with respect to X
- X : array_like, shape = (n_samples, n_features)
- The location to compute gradient
- grad : array_like, shape = (n_samples, n_features)
- The gradient for every location at X
grad = a * b * x ^(a-1) * (1 - x^a)^(b-1)
-
update_grads
(X, dL_dW)[source]¶ Update the gradients of marginal log likelihood with respect to the parameters of warping function
- X : array_like, shape = (n_samples, n_features)
- The input BEFORE warping
- dL_dW : array_like, shape = (n_samples, n_features)
- The gradient of marginal log likelihood with respect to the Warped input
let w = f(x), the input after warping, then dW_da = b * (1 - x^a)^(b - 1) * x^a * ln(x) dW_db = - (1 - x^a)^b * ln(1 - x^a) dL_da = dL_dW * dW_da dL_db = dL_dW * dW_db
GPy.util.linalg module¶
-
DSYR_blas
(A, x, alpha=1.0)[source]¶ Performs a symmetric rank-1 update operation: A <- A + alpha * np.dot(x,x.T)
Parameters: - A – Symmetric NxN np.array
- x – Nx1 np.array
- alpha – scalar
-
DSYR_numpy
(A, x, alpha=1.0)[source]¶ Performs a symmetric rank-1 update operation: A <- A + alpha * np.dot(x,x.T)
Parameters: - A – Symmetric NxN np.array
- x – Nx1 np.array
- alpha – scalar
-
backsub_both_sides
(L, X, transpose='left')[source]¶ Return L^-T * X * L^-1, assumuing X is symmetrical and L is lower cholesky
-
dpotri
(A, lower=1)[source]¶ Wrapper for lapack dpotri function
- DPOTRI - compute the inverse of a real symmetric positive
- definite matrix A using the Cholesky factorization A = U**T*U or A = L*L**T computed by DPOTRF
Parameters: - A – Matrix A
- lower – is matrix lower (true) or upper (false)
Returns: A inverse
-
dpotrs
(A, B, lower=1)[source]¶ Wrapper for lapack dpotrs function :param A: Matrix A :param B: Matrix B :param lower: is matrix lower (true) or upper (false) :returns:
-
dtrtri
(L)[source]¶ Inverts a Cholesky lower triangular matrix
Parameters: L – lower triangular matrix Return type: inverse of L
-
dtrtrs
(A, B, lower=1, trans=0, unitdiag=0)[source]¶ Wrapper for lapack dtrtrs function
DTRTRS solves a triangular system of the form
A * X = B or A**T * X = B,where A is a triangular matrix of order N, and B is an N-by-NRHS matrix. A check is made to verify that A is nonsingular.
Parameters: - A – Matrix A(triangular)
- B – Matrix B
- lower – is matrix lower (true) or upper (false)
Returns: Solution to A * X = B or A**T * X = B
-
ijk_ljk_to_ilk
(A, B)[source]¶ Faster version of einsum np.einsum(‘ijk,ljk->ilk’, A, B)
I.e A.dot(B.T) for every dimension
-
mdot
(*args)[source]¶ Multiply all the arguments using matrix product rules. The output is equivalent to multiplying the arguments one by one from left to right using dot(). Precedence can be controlled by creating tuples of arguments, for instance mdot(a,((b,c),d)) multiplies a (a*((b*c)*d)). Note that this means the output of dot(a,b) and mdot(a,b) will differ if a or b is a pure tuple of numbers.
-
multiple_pdinv
(A)[source]¶ Parameters: A – A DxDxN numpy array (each A[:,:,i] is pd) Rval invs: the inverses of A Rtype invs: np.ndarray Rval hld: 0.5* the log of the determinants of A Rtype hld: np.array
-
pca
(Y, input_dim)[source]¶ Principal component analysis: maximum likelihood solution by SVD
Parameters: - Y – NxD np.array of data
- input_dim – int, dimension of projection
Rval X: - Nxinput_dim np.array of dimensionality reduced data
Rval W: - input_dimxD mapping from X to Y
-
pdinv
(A, *args)[source]¶ Parameters: A – A DxD pd numpy array Rval Ai: the inverse of A Rtype Ai: np.ndarray Rval L: the Cholesky decomposition of A Rtype L: np.ndarray Rval Li: the Cholesky decomposition of Ai Rtype Li: np.ndarray Rval logdet: the log of the determinant of A Rtype logdet: float64
-
ppca
(Y, Q, iterations=100)[source]¶ EM implementation for probabilistic pca.
Parameters: - Y (array-like) – Observed Data
- Q (int) – Dimensionality for reduced array
- iterations (int) – number of iterations for EM
-
symmetrify
(A, upper=False)[source]¶ Take the square matrix A and make it symmetrical by copting elements from the lower half to the upper
works IN PLACE.
note: tries to use cython, falls back to a slower numpy version
GPy.util.linalg_cython module¶
GPy.util.linalg_gpu module¶
GPy.util.ln_diff_erfs module¶
-
ln_diff_erfs
(x1, x2, return_sign=False)[source]¶ Function for stably computing the log of difference of two erfs in a numerically stable manner. :param x1 : argument of the positive erf :type x1: ndarray :param x2 : argument of the negative erf :type x2: ndarray :return: tuple containing (log(abs(erf(x1) - erf(x2))), sign(erf(x1) - erf(x2)))
Based on MATLAB code that was written by Antti Honkela and modified by David Luengo and originally derived from code by Neil Lawrence.
GPy.util.misc module¶
-
chain_1
(df_dg, dg_dx)[source]¶ Generic chaining function for first derivative
\[\frac{d(f . g)}{dx} = \frac{df}{dg} \frac{dg}{dx}\]
-
chain_2
(d2f_dg2, dg_dx, df_dg, d2g_dx2)[source]¶ Generic chaining function for second derivative
\[\frac{d^{2}(f . g)}{dx^{2}} = \frac{d^{2}f}{dg^{2}}(\frac{dg}{dx})^{2} + \frac{df}{dg}\frac{d^{2}g}{dx^{2}}\]
-
chain_3
(d3f_dg3, dg_dx, d2f_dg2, d2g_dx2, df_dg, d3g_dx3)[source]¶ Generic chaining function for third derivative
\[\frac{d^{3}(f . g)}{dx^{3}} = \frac{d^{3}f}{dg^{3}}(\frac{dg}{dx})^{3} + 3\frac{d^{2}f}{dg^{2}}\frac{dg}{dx}\frac{d^{2}g}{dx^{2}} + \frac{df}{dg}\frac{d^{3}g}{dx^{3}}\]
-
kmm_init
(X, m=10)[source]¶ This is the same initialization algorithm that is used in Kmeans++. It’s quite simple and very useful to initialize the locations of the inducing points in sparse GPs.
Parameters: - X – data
- m – number of inducing points
-
linear_grid
(D, n=100, min_max=(-100, 100))[source]¶ Creates a D-dimensional grid of n linearly spaced points
Parameters: - D – dimension of the grid
- n – number of points
- min_max – (min, max) list
-
opt_wrapper
(m, **kwargs)[source]¶ Thit function just wraps the optimization procedure of a GPy object so that optimize() pickleable (necessary for multiprocessing).
-
param_to_array
(*param)[source]¶ Convert an arbitrary number of parameters to :class:ndarray class objects. This is for converting parameter objects to numpy arrays, when using scipy.weave.inline routine. In scipy.weave.blitz there is no automatic array detection (even when the array inherits from :class:ndarray)
GPy.util.mocap module¶
-
class
acclaim_skeleton
(file_name=None)[source]¶ Bases:
GPy.util.mocap.skeleton
-
load_skel
(file_name)[source]¶ Loads an ASF file into a skeleton structure.
Parameters: file_name – The file name to load in.
-
read_line
(fid)[source]¶ Read a line from a file string and check it isn’t either empty or commented before returning.
-
resolve_indices
(index, start_val)[source]¶ Get indices for the skeleton from the channels when loading in channel data.
-
-
class
skeleton
[source]¶ Bases:
GPy.util.mocap.tree
-
finalize
()[source]¶ After loading in a skeleton ensure parents are correct, vertex orders are correct and rotation matrices are correct.
-
-
class
tree
[source]¶ Bases:
object
-
find_children
()[source]¶ Take a tree and set the children according to the parents.
Takes a tree structure which lists the parents of each vertex and computes the children for each vertex and places them in.
-
find_parents
()[source]¶ Take a tree and set the parents according to the children
Takes a tree structure which lists the children of each vertex and computes the parents for each vertex and places them in.
-
order_vertices
()[source]¶ Order vertices in the graph such that parents always have a lower index than children.
-
swap_vertices
(i, j)[source]¶ Swap two vertices in the tree structure array. swap_vertex swaps the location of two vertices in a tree structure array.
Parameters: - tree – the tree for which two vertices are to be swapped.
- i – the index of the first vertex to be swapped.
- j – the index of the second vertex to be swapped.
Rval tree: the tree structure with the two vertex locations swapped.
-
-
load_text_data
(dataset, directory, centre=True)[source]¶ Load in a data set of marker points from the Ohio State University C3D motion capture files (http://accad.osu.edu/research/mocap/mocap_data.htm).
-
parse_text
(file_name)[source]¶ Parse data from Ohio State University text mocap files (http://accad.osu.edu/research/mocap/mocap_data.htm).
-
read_connections
(file_name, point_names)[source]¶ Read a file detailing which markers should be connected to which for motion capture data.
-
rotation_matrix
(xangle, yangle, zangle, order='zxy', degrees=False)[source]¶ Compute the rotation matrix for an angle in each direction. This is a helper function for computing the rotation matrix for a given set of angles in a given order.
Parameters: - xangle – rotation for x-axis.
- yangle – rotation for y-axis.
- zangle – rotation for z-axis.
- order – the order for the rotations.
GPy.util.multioutput module¶
-
ICM
(input_dim, num_outputs, kernel, W_rank=1, W=None, kappa=None, name='ICM')[source]¶ Builds a kernel for an Intrinsic Coregionalization Model
Input_dim: Input dimensionality (does not include dimension of indices)
Num_outputs: Number of outputs
Parameters: - kernel (a GPy kernel) – kernel that will be multiplied by the coregionalize kernel (matrix B).
- W_rank (integer) – number tuples of the corregionalization parameters ‘W’
-
LCM
(input_dim, num_outputs, kernels_list, W_rank=1, name='ICM')[source]¶ Builds a kernel for an Linear Coregionalization Model
Input_dim: Input dimensionality (does not include dimension of indices)
Num_outputs: Number of outputs
Parameters: - kernel (a GPy kernel) – kernel that will be multiplied by the coregionalize kernel (matrix B).
- W_rank (integer) – number tuples of the corregionalization parameters ‘W’
-
Private
(input_dim, num_outputs, kernel, output, kappa=None, name='X')[source]¶ Builds a kernel for an Intrinsic Coregionalization Model
Input_dim: Input dimensionality
Num_outputs: Number of outputs
Parameters: - kernel (a GPy kernel) – kernel that will be multiplied by the coregionalize kernel (matrix B).
- W_rank (integer) – number tuples of the corregionalization parameters ‘W’
-
index_to_slices
(index)[source]¶ take a numpy array of integers (index) and return a nested list of slices such that the slices describe the start, stop points for each integer in the index.
e.g. >>> index = np.asarray([0,0,0,1,1,1,2,2,2]) returns >>> [[slice(0,3,None)],[slice(3,6,None)],[slice(6,9,None)]]
or, a more complicated example >>> index = np.asarray([0,0,1,1,0,2,2,2,1,1]) returns >>> [[slice(0,2,None),slice(4,5,None)],[slice(2,4,None),slice(8,10,None)],[slice(5,8,None)]]
GPy.util.netpbmfile module¶
Read and write image data from respectively to Netpbm files.
This implementation follows the Netpbm format specifications at http://netpbm.sourceforge.net/doc/. No gamma correction is performed.
The following image formats are supported: PBM (bi-level), PGM (grayscale), PPM (color), PAM (arbitrary), XV thumbnail (RGB332, read-only).
Author: | Christoph Gohlke |
---|---|
Organization: | Laboratory for Fluorescence Dynamics, University of California, Irvine |
Version: | 2013.01.18 |
Requirements¶
- CPython 2.7, 3.2 or 3.3
- Numpy 1.7
- Matplotlib 1.2 (optional for plotting)
Examples¶
>>> im1 = numpy.array([[0, 1],[65534, 65535]], dtype=numpy.uint16)
>>> imsave('_tmp.pgm', im1)
>>> im2 = imread('_tmp.pgm')
>>> assert numpy.all(im1 == im2)
-
class
NetpbmFile
(arg=None, **kwargs)[source]¶ Bases:
object
Read and write Netpbm PAM, PBM, PGM, PPM, files.
Initialize instance from filename, open file, or numpy array.
GPy.util.normalizer module¶
Created on Aug 27, 2014
@author: Max Zwiessele
GPy.util.parallel module¶
The module of tools for parallelization (MPI)
GPy.util.pca module¶
Created on 10 Sep 2012
@author: Max Zwiessele @copyright: Max Zwiessele 2012
-
class
PCA
(X)[source]¶ Bases:
object
PCA module with automatic primal/dual determination.
-
plot_2d
(X, labels=None, s=20, marker='o', dimensions=(0, 1), ax=None, colors=None, fignum=None, cmap=None, **kwargs)[source]¶ Plot dimensions dimensions with given labels against each other in PC space. Labels can be any sequence of labels of dimensions X.shape[0]. Labels can be drawn with a subsequent call to legend()
-
GPy.util.quad_integrate module¶
The file for utilities related to integration by quadrature methods - will contain implementation for gaussian-kronrod integration.
-
quadgk_int
(f, fmin=-inf, fmax=inf, difftol=0.1)[source]¶ Integrate f from fmin to fmax, do integration by substitution x = r / (1-r**2) when r goes from -1 to 1 , x goes from -inf to inf. the interval for quadgk function is from -1 to +1, so we transform the space from (-inf,inf) to (-1,1) :param f: :param fmin: :param fmax: :param difftol: :return:
-
quadvgk
(feval, fmin, fmax, tol1=1e-05, tol2=1e-05)[source]¶ numpy implementation makes use of the code here: http://se.mathworks.com/matlabcentral/fileexchange/18801-quadvgk We here use gaussian kronrod integration already used in gpstuff for evaluating one dimensional integrals. This is vectorised quadrature which means that several functions can be evaluated at the same time over a grid of points. :param f: :param fmin: :param fmax: :param difftol: :return:
GPy.util.subarray_and_sorting module¶
Module author: Max Zwiessele <ibinbei@gmail.com>
-
common_subarrays
(X, axis=0)[source]¶ Find common subarrays of 2 dimensional X, where axis is the axis to apply the search over. Common subarrays are returned as a dictionary of <subarray, [index]> pairs, where the subarray is a tuple representing the subarray and the index is the index for the subarray in X, where index is the index to the remaining axis.
:param
np.ndarray
X: 2d array to check for common subarrays in :param int axis: axis to apply subarray detection over.When the index is 0, compare rows – columns, otherwise.In a 2d array: >>> import numpy as np >>> X = np.zeros((3,6), dtype=bool) >>> X[[1,1,1],[0,4,5]] = 1; X[1:,[2,3]] = 1 >>> X array([[False, False, False, False, False, False],
[ True, False, True, True, True, True], [False, False, True, True, False, False]], dtype=bool)>>> d = common_subarrays(X,axis=1) >>> len(d) 3 >>> X[:, d[tuple(X[:,0])]] array([[False, False, False], [ True, True, True], [False, False, False]], dtype=bool) >>> d[tuple(X[:,4])] == d[tuple(X[:,0])] == [0, 4, 5] True >>> d[tuple(X[:,1])] [1]
GPy.util.univariate_Gaussian module¶
-
cdfNormal
(z)[source]¶ Robust implementations of cdf of a standard normal.
@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.*/
-
derivLogCdfNormal
(z)[source]¶ Robust implementations of derivative of the log cdf of a standard normal.
@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.
-
inv_std_norm_cdf
(x)[source]¶ Inverse cumulative standard Gaussian distribution Based on Winitzki, S. (2008)
-
logCdfNormal
(z)[source]¶ Robust implementations of log cdf of a standard normal.
@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.
-
logPdfNormal
(z)[source]¶ Robust implementations of log pdf of a standard normal.
@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.
GPy.util.warping_functions module¶
-
class
IdentityFunction
(closed_inverse=True)[source]¶ Bases:
GPy.util.warping_functions.WarpingFunction
Identity warping function. This is for testing and sanity check purposes and should not be used in practice. The closed_inverse flag should only be set to False for debugging and testing purposes.
-
class
LogFunction
(closed_inverse=True)[source]¶ Bases:
GPy.util.warping_functions.WarpingFunction
Easy wrapper for applying a fixed log warping function to positive-only values. The closed_inverse flag should only be set to False for debugging and testing purposes.
-
class
TanhFunction
(n_terms=3, initial_y=None)[source]¶ Bases:
GPy.util.warping_functions.WarpingFunction
This is the function proposed in Snelson et al.: A sum of tanh functions with linear trends outside the range. Notice the term ‘d’, which scales the linear trend.
n_terms specifies the number of tanh terms to be used
-
f
(y)[source]¶ Transform y with f using parameter vector psi psi = [[a,b,c]]
\(f = (y * d) + \sum_{terms} a * tanh(b *(y + c))\)
-
fgrad_y
(y, return_precalc=False)[source]¶ gradient of f w.r.t to y ([N x 1])
Returns: Nx1 vector of derivatives, unless return_precalc is true, then it also returns the precomputed stuff
-
-
class
WarpingFunction
(name)[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
abstract function for warping z = f(y)
GPy.plotting package¶
Introduction¶
GPy.plotting
effectively extends models based on
GPy.core.gp.GP
(and other classes) by adding methods to
plot useful charts. ‘matplotlib’, ‘plotly’ (online) and ‘plotly’
(offline) are supported. The methods in GPy.plotting
(and
child classes GPy.plotting.gpy_plot
and
GPy.plotting.matplot_dep
) are not intended to be called
directly, but rather are ‘injected’ into other classes (notably
GPy.core.gp.GP
). Documentation describing plots is best
found associated with the model being plotted
e.g. GPy.core.gp.GP.plot_confidence
.
Subpackages¶
GPy.plotting.gpy_plot package¶
Submodules¶
GPy.plotting.gpy_plot.data_plots module¶
-
plot_data
(self, which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)[source]¶ - Plot the training data
- For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_data_error
(self, which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)[source]¶ Plot the training data input error.
For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.
Can plot only part of the data using which_data_rows and which_data_ycols.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- label (str) – the label for the plot
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list: of plots created.
-
plot_errorbars_trainset
(self, which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)[source]¶ Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.
This also works for heteroscedastic likelihoods.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- which_data_ycols – when the data has several columns (independant outputs), only plot these
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_inducing
(self, visible_dims=None, projection='2d', label='inducing', legend=True, **plot_kwargs)[source]¶ Plot the inducing inputs of a sparse gp model
Parameters: - visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- plot_kwargs (kwargs) – keyword arguments for the plotting library
GPy.plotting.gpy_plot.gp_plots module¶
-
plot
(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)[source]¶ Convenience function for plotting the fit of a GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
If you want fine graned control use the specific plotting functions supplied in the model.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- projection ({2d|3d}) – plot in 2d or 3d?
- legend (bool) – convenience, whether to put a legend on the plot or not.
-
plot_confidence
(self, lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)[source]¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_density
(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)[source]¶ Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_f
(self, plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)[source]¶ Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!
If you want fine graned control use the specific plotting functions supplied in the model.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [default:200]
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
- which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
- visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
- levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
- samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
- lower (float) – the lower percentile to plot
- upper (float) – the upper percentile to plot
- plot_data (bool) – plot the data into the plot?
- plot_inducing (bool) – plot inducing inputs?
- plot_density (bool) – plot density instead of the confidence interval?
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
- plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
-
plot_mean
(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)[source]¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only?
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- levels (int) – for 2D plotting, the number of contour levels to use is
- projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
- label (str) – the label for the plot.
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
-
plot_samples
(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)[source]¶ Plot the mean of the GP.
You can deactivate the legend for this one plot by supplying None to label.
Give the Y_metadata in the predict_kw if you need it.
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
- resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
- plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
- apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
- visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
- which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
- predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
- levels (int) – for 2D plotting, the number of contour levels to use is
GPy.plotting.gpy_plot.inference_plots module¶
GPy.plotting.gpy_plot.kernel_plots module¶
-
plot_ARD
(kernel, filtering=None, legend=False, canvas=None, **kwargs)[source]¶ If an ARD kernel is present, plot a bar representation using matplotlib
Parameters: - fignum – figure number of the plot
- filtering (list of names to use for ARD plot) – list of names, which to use for plotting ARD parameters. Only kernels which match names in the list of names in filtering will be used for plotting.
-
plot_covariance
(kernel, x=None, label=None, plot_limits=None, visible_dims=None, resolution=None, projection='2d', levels=20, **kwargs)[source]¶ Plot a kernel covariance w.r.t. another x.
Parameters: - x (array-like) – the value to use for the other kernel argument (kernels are a function of two variables!)
- plot_limits (Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D) – the range over which to plot the kernel
- visible_dims (array-like) – input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
- projection ({2d|3d}) – What projection shall we use to plot the kernel?
- levels (int) – for 2D projection, how many levels for the contour plot to use?
- kwargs – valid kwargs for your specific plotting library
Resolution: the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.
GPy.plotting.gpy_plot.latent_plots module¶
-
plot_latent
(self, labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)[source]¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- scatter_kwargs – the kwargs for the scatter plots
-
plot_latent_inducing
(self, which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)[source]¶ Plot a scatter plot of the inducing inputs.
Parameters: - which_indices ([int]) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – marker to use [default is custom arrow like]
- kwargs – the kwargs for the scatter plots
- projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
-
plot_latent_scatter
(self, labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)[source]¶ Plot a scatter plot of the latent space.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- marker (str) – markers to use - cycle if more labels then markers are given
- kwargs – the kwargs for the scatter plots
-
plot_magnification
(self, labels=None, which_indices=None, resolution=60, marker='<>^vsd', legend=True, plot_limits=None, updates=False, mean=True, covariance=True, kern=None, num_samples=1000, scatter_kwargs=None, plot_scatter=True, **imshow_kwargs)[source]¶ Plot the magnification factor of the GP on the inputs. This is the density of the GP as a gray scale.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- marker (str) – markers to use - cycle if more labels then markers are given
- legend (bool) – whether to plot the legend on the figure
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- mean (bool) – use the mean of the Wishart embedding for the magnification factor
- covariance (bool) – use the covariance of the Wishart embedding for the magnification factor
- kern (
Kern
) – the kernel to use for prediction - num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- kwargs – the kwargs for the scatter plots
-
plot_steepest_gradient_map
(self, output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)[source]¶ Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.
Parameters: - labels (array-like) – a label for each data point (row) of the inputs
- which_indices ((int, int)) – which input dimensions to plot against each other
- resolution (int) – the resolution at which we predict the magnification factor
- legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
- plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
- updates (bool) – if possible, make interactive updates using the specific library you are using
- kern (
Kern
) – the kernel to use for prediction - marker (str) – markers to use - cycle if more labels then markers are given
- num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
- imshow_kwargs – the kwargs for the imshow (magnification factor)
- annotation_kwargs – the kwargs for the annotation plot
- scatter_kwargs – the kwargs for the scatter plots
GPy.plotting.gpy_plot.plot_util module¶
-
get_fixed_dims
(fixed_inputs)[source]¶ Work out the fixed dimensions from the fixed_inputs list of tuples.
-
get_free_dims
(model, visible_dims, fixed_dims)[source]¶ work out what the inputs are for plotting (1D or 2D)
The visible dimensions are the dimensions, which are visible. the fixed_dims are the fixed dimensions for this.
The free_dims are then the visible dims without the fixed dims.
-
get_x_y_var
(model)[source]¶ Either the the data from a model as X the inputs, X_variance the variance of the inputs ([default: None]) and Y the outputs
If (X, X_variance, Y) is given, this just returns.
Returns: (X, X_variance, Y)
-
helper_for_plot_data
(self, X, plot_limits, visible_dims, fixed_inputs, resolution)[source]¶ Figure out the data, free_dims and create an Xgrid for the prediction.
This is only implemented for two dimensions for now!
-
helper_predict_with_model
(self, Xgrid, plot_raw, apply_link, percentiles, which_data_ycols, predict_kw, samples=0)[source]¶ Make the right decisions for prediction with a model based on the standard arguments of plotting.
This is quite complex and will take a while to understand, so do not change anything in here lightly!!!
-
subsample_X
(X, labels, num_samples=1000)[source]¶ Stratified subsampling if labels are given. This means due to rounding errors you might get a little differences between the num_samples and the returned subsampled X.
-
update_not_existing_kwargs
(to_update, update_from)[source]¶ This function updates the keyword aguments from update_from in to_update, only if the keys are not set in to_update.
This is used for updated kwargs from the default dicts.
GPy.plotting.matplot_dep package¶
Subpackages¶
Created on 24 Jul 2013
@author: maxz
-
class
AxisChangedController
(ax, update_lim=None)[source]¶ Bases:
GPy.plotting.matplot_dep.controllers.axis_event_controller.AxisEventController
Buffered control of axis limit changes
Constructor
-
class
BufferedAxisChangedController
(ax, plot_function, plot_limits, resolution=50, update_lim=None, **kwargs)[source]¶ Bases:
GPy.plotting.matplot_dep.controllers.axis_event_controller.AxisChangedController
Buffered axis changed controller. Controls the buffer and handles update events for when the axes changed.
Updated plotting will be after first reload (first time will be within plot limits, after that the limits will be buffered)
Parameters: - plot_function (function) – function to use for creating image for plotting (return ndarray-like) plot_function gets called with (2D!) Xtest grid if replotting required
- plot_limits – beginning plot limits [xmin, ymin, xmax, ymax]
- kwargs – additional kwargs are for pyplot.imshow(**kwargs)
Created on 24 Jul 2013
@author: maxz
-
class
ImAnnotateController
(ax, plot_function, plot_limits, resolution=20, update_lim=0.99, imshow_kwargs=None, **kwargs)[source]¶ Bases:
GPy.plotting.matplot_dep.controllers.imshow_controller.ImshowController
Parameters: - plot_function (function) – function to use for creating image for plotting (return ndarray-like) plot_function gets called with (2D!) Xtest grid if replotting required
- plot_limits – beginning plot limits [xmin, ymin, xmax, ymax]
- text_props – kwargs for pyplot.text(**text_props)
- kwargs – additional kwargs are for pyplot.imshow(**kwargs)
-
class
ImshowController
(ax, plot_function, plot_limits, resolution=50, update_lim=0.9, **kwargs)[source]¶ Bases:
GPy.plotting.matplot_dep.controllers.axis_event_controller.BufferedAxisChangedController
Parameters: - plot_function (function) – function to use for creating image for plotting (return ndarray-like) plot_function gets called with (2D!) Xtest grid if replotting required
- plot_limits – beginning plot limits [xmin, ymin, xmax, ymax]
- kwargs – additional kwargs are for pyplot.imshow(**kwargs)
Submodules¶
GPy.plotting.matplot_dep.base_plots module¶
-
gpplot
(x, mu, lower, upper, edgecol='#3300FF', fillcol='#33CCFF', ax=None, fignum=None, **kwargs)[source]¶
GPy.plotting.matplot_dep.defaults module¶
GPy.plotting.matplot_dep.img_plots module¶
The module contains the tools for ploting 2D image visualizations
GPy.plotting.matplot_dep.mapping_plots module¶
-
plot_mapping
(self, plot_limits=None, which_data='all', which_parts='all', resolution=None, levels=20, samples=0, fignum=None, ax=None, fixed_inputs=[], linecol='#204a87')[source]¶ - Plots the mapping associated with the model.
- In one dimension, the function is plotted.
- In two dimsensions, a contour-plot shows the function
- In higher dimensions, we’ve not implemented this yet !TODO!
Can plot only part of the data and part of the posterior functions using which_data and which_functions
Parameters: - plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
- which_data ('all' or a slice object to slice self.X, self.Y) – which if the training data to plot (default all)
- which_parts ('all', or list of bools) – which of the kernel functions to plot (additively)
- resolution (int) – the number of intervals to sample the GP on. Defaults to 200 in 1D and 50 (a 50x50 grid) in 2D
- levels (int) – number of levels to plot in a contour plot.
- samples (int) – the number of a posteriori samples to plot
- fignum (figure number) – figure to plot on.
- ax (axes handle) – axes to plot on.
- fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input index i should be set to value v.
- linecol – color of line to plot.
- levels – for 2D plotting, the number of contour levels to use is ax is None, create a new figure
GPy.plotting.matplot_dep.maps module¶
-
bbox_match
(sf, bbox, inside_only=True)[source]¶ Return the geometry and attributes of a shapefile that lie within (or intersect) a bounding box
Parameters: - sf (shapefile object) – shapefile
- bbox (list of floats [x_min,y_min,x_max,y_max]) – bounding box
Inside_only: True if the objects returned are those that lie within the bbox and False if the objects returned are any that intersect the bbox
-
plot
(shape_records, facecolor='w', edgecolor='k', linewidths=0.5, ax=None, xlims=None, ylims=None)[source]¶ Plot the geometry of a shapefile
Parameters: - shape_records (ShapeRecord object (output of a shapeRecords() method)) – geometry and attributes list
- facecolor – color to be used to fill in polygons
- edgecolor – color to be used for lines
- ax (axes handle) – axes to plot on.
-
plot_bbox
(sf, bbox, inside_only=True)[source]¶ Plot the geometry of a shapefile within a bbox
Parameters: - sf (shapefile object) – shapefile
- bbox (list of floats [x_min,y_min,x_max,y_max]) – bounding box
Inside_only: True if the objects returned are those that lie within the bbox and False if the objects returned are any that intersect the bbox
GPy.plotting.matplot_dep.plot_definitions module¶
-
class
MatplotlibPlots
[source]¶ Bases:
GPy.plotting.abstract_plotting_library.AbstractPlottingLibrary
-
add_to_canvas
(ax, plots, legend=False, title=None, **kwargs)[source]¶ Add plots is a dictionary with the plots as the items or a list of plots as items to canvas.
The kwargs are plotting library specific kwargs!
E.g. in matplotlib this does not have to do anything to add stuff, but we set the legend and title.
!This function returns the updated canvas!
Parameters: - title – the title of the plot
- legend – whether to plot a legend or not
-
annotation_heatmap
(ax, X, annotation, extent=None, label=None, imshow_kwargs=None, **annotation_kwargs)[source]¶ Plot an annotation heatmap. That is like an imshow, but put the text of the annotation inside the cells of the heatmap (centered).
Parameters: - canvas – the canvas to plot on
- annotation (array-like) – the annotation labels for the heatmap
- extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
- label (str) – the label for the heatmap
Returns: a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)
-
annotation_heatmap_interact
(ax, plot_function, extent, label=None, resolution=15, imshow_kwargs=None, **annotation_kwargs)[source]¶ if plot_function is not None, return an interactive updated heatmap, which updates on axis events, so that one can zoom in and out and the heatmap gets updated. See the matplotlib implementation in matplot_dep.controllers.
the plot_function returns a pair (X, annotation) to plot, when called with a new input X (which would be the grid, which is visible on the plot right now)
Parameters: - canvas – the canvas to plot on
- annotation (array-like) – the annotation labels for the heatmap
- extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
- label (str) – the label for the heatmap
- plot_function – the function, which generates new data for given input locations X
- resolution (int) – the resolution of the interactive plot redraw - this is only needed when giving a plot_function
Returns: a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)
-
barplot
(ax, x, height, width=0.8, bottom=0, color='#3465a4', label=None, **kwargs)[source]¶ Plot vertical bar plot centered at x with height and width of bars. The y level is at bottom.
the kwargs are plotting library specific kwargs!
Parameters: - x (array-like) – the center points of the bars
- height (array-like) – the height of the bars
- width (array-like) – the width of the bars
- bottom (array-like) – the start y level of the bars
- kwargs – kwargs for the specific library you are using.
-
contour
(ax, X, Y, C, levels=20, label=None, **kwargs)[source]¶ Make a contour plot at (X, Y) with heights/colors stored in C on the canvas.
if Z is not None: make 3d contour plot at (X, Y, Z) with heights/colors stored in C on the canvas.
the kwargs are plotting library specific kwargs!
-
figure
(rows=1, cols=1, gridspec_kwargs={}, tight_layout=True, **kwargs)[source]¶ Get a new figure with nrows and ncolumns subplots. Does not initialize the canvases yet.
There is individual kwargs for the individual plotting libraries to use.
-
fill_between
(ax, X, lower, upper, color='#3465a4', label=None, **kwargs)[source]¶ Fill along the xaxis between lower and upper.
the kwargs are plotting library specific kwargs!
-
fill_gradient
(canvas, X, percentiles, color='#3465a4', label=None, **kwargs)[source]¶ Plot a gradient (in alpha values) for the given percentiles.
the kwargs are plotting library specific kwargs!
-
imshow
(ax, X, extent=None, label=None, vmin=None, vmax=None, **imshow_kwargs)[source]¶ Show the image stored in X on the canvas.
The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!
the kwargs are plotting library specific kwargs!
-
imshow_interact
(ax, plot_function, extent, label=None, resolution=None, vmin=None, vmax=None, **imshow_kwargs)[source]¶ This function is optional!
Create an imshow controller to stream the image returned by the plot_function. There is an imshow controller written for mmatplotlib, which updates the imshow on changes in axis.
The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!
the kwargs are plotting library specific kwargs!
-
new_canvas
(figure=None, row=1, col=1, projection='2d', xlabel=None, ylabel=None, zlabel=None, title=None, xlim=None, ylim=None, zlim=None, **kwargs)[source]¶ Return a canvas, kwargupdate for your plotting library.
if figure is not None, create a canvas in the figure at subplot position (col, row).
This method does two things, it creates an empty canvas and updates the kwargs (deletes the unnecessary kwargs) for further usage in normal plotting.
the kwargs are plotting library specific kwargs!
Parameters: projection ({'2d'|'3d'}) – The projection to use. E.g. in matplotlib this means it deletes references to ax, as plotting is done on the axis itself and is not a kwarg.
Parameters: - xlabel – the label to put on the xaxis
- ylabel – the label to put on the yaxis
- zlabel – the label to put on the zaxis (if plotting in 3d)
- title – the title of the plot
- legend – if True, plot a legend, if int make legend rows in the legend
- float) xlim ((float,) – the limits for the xaxis
- float) ylim ((float,) – the limits for the yaxis
- float) zlim ((float,) – the limits for the zaxis (if plotting in 3d)
-
plot
(ax, X, Y, Z=None, color=None, label=None, **kwargs)[source]¶ Make a line plot from for Y on X (Y = f(X)) on the canvas. If Z is not None, plot in 3d!
the kwargs are plotting library specific kwargs!
-
plot_axis_lines
(ax, X, color='#a40000', label=None, **kwargs)[source]¶ Plot lines at the bottom (lower boundary of yaxis) of the axis at input location X.
If X is two dimensional, plot in 3d and connect the axis lines to the bottom of the Z axis.
the kwargs are plotting library specific kwargs!
-
scatter
(ax, X, Y, Z=None, color='#3465a4', label=None, marker='o', **kwargs)[source]¶ Make a scatter plot between X and Y on the canvas given.
the kwargs are plotting library specific kwargs!
Parameters: - canvas – the plotting librarys specific canvas to plot on.
- X (array-like) – the inputs to plot.
- Y (array-like) – the outputs to plot.
- Z (array-like) – the Z level to plot (if plotting 3d).
- c (array-like) – the colorlevel for each point.
- vmin (float) – minimum colorscale
- vmax (float) – maximum colorscale
- kwargs – the specific kwargs for your plotting library
-
surface
(ax, X, Y, Z, color=None, label=None, **kwargs)[source]¶ Plot a surface for 3d plotting for the inputs (X, Y, Z).
the kwargs are plotting library specific kwargs!
-
GPy.plotting.matplot_dep.ssgplvm module¶
The module plotting results for SSGPLVM
GPy.plotting.matplot_dep.svig_plots module¶
GPy.plotting.matplot_dep.util module¶
-
align_subplot_array
(axes, xlim=None, ylim=None)[source]¶ Make all of the axes in the array hae the same limits, turn off unnecessary ticks use plt.subplots() to get an array of axes
-
align_subplots
(N, M, xlim=None, ylim=None)[source]¶ make all of the subplots have the same limits, turn off unnecessary ticks
-
fixed_inputs
(model, non_fixed_inputs, fix_routine='median', as_list=True, X_all=False)[source]¶ Convenience function for returning back fixed_inputs where the other inputs are fixed using fix_routine :param model: model :type model: Model :param non_fixed_inputs: dimensions of non fixed inputs :type non_fixed_inputs: list :param fix_routine: fixing routine to use, ‘mean’, ‘median’, ‘zero’ :type fix_routine: string :param as_list: if true, will return a list of tuples with (dimension, fixed_val) otherwise it will create the corresponding X matrix :type as_list: boolean
GPy.plotting.matplot_dep.variational_plots module¶
-
plot
(parameterized, fignum=None, ax=None, colors=None, figsize=(12, 6))[source]¶ Plot latent space X in 1D:
- if fig is given, create input_dim subplots in fig and plot in these
- if ax is given plot input_dim 1D latent space plots of X into each axis
- if neither fig nor ax is given create a figure with fignum and plot in there
- colors:
- colors of different latent space dimensions input_dim
-
plot_SpikeSlab
(parameterized, fignum=None, ax=None, colors=None, side_by_side=True)[source]¶ Plot latent space X in 1D:
- if fig is given, create input_dim subplots in fig and plot in these
- if ax is given plot input_dim 1D latent space plots of X into each axis
- if neither fig nor ax is given create a figure with fignum and plot in there
- colors:
- colors of different latent space dimensions input_dim
GPy.plotting.matplot_dep.visualize module¶
-
class
data_show
(vals)[source]¶ Bases:
object
The data_show class is a base class which describes how to visualize a particular data set. For example, motion capture data can be plotted as a stick figure, or images are shown using imshow. This class enables latent to data visualizations for the GP-LVM.
-
class
image_show
(vals, axes=None, dimensions=(16, 16), transpose=False, order='C', invert=False, scale=False, palette=[], preset_mean=0.0, preset_std=1.0, select_image=0, cmap=None)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.matplotlib_show
Show a data vector as an image. This visualizer rehapes the output vector and displays it as an image.
Parameters: - vals (axes handle) – the values of the output to display.
- axes – the axes to show the output on.
- dimensions (tuple) – the dimensions that the image needs to be transposed to for display.
- transpose – whether to transpose the image before display.
- order (string) – whether array is in Fortan ordering (‘F’) or Python ordering (‘C’). Default is python (‘C’).
- invert (bool) – whether to invert the pixels or not (default False).
- palette – a palette to use for the image.
- preset_mean (double) – the preset mean of a scaled image.
- preset_std (double) – the preset standard deviation of a scaled image.
- cmap (matplotlib.cm) – the colormap for image visualization
-
class
lvm
(vals, model, data_visualize, latent_axes=None, sense_axes=None, latent_index=[0, 1], disable_drag=False)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.matplotlib_show
Visualize a latent variable model
Parameters: - model – the latent variable model to visualize.
- data_visualize (visualize.data_show type.) – the object used to visualize the data which has been modelled.
- latent_axes – the axes where the latent visualization should be plotted.
-
class
lvm_dimselect
(vals, model, data_visualize, latent_axes=None, sense_axes=None, latent_index=[0, 1], labels=None)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.lvm
A visualizer for latent variable models which allows selection of the latent dimensions to use by clicking on a bar chart of their length scales.
For an example of the visualizer’s use try:
GPy.examples.dimensionality_reduction.BGPVLM_oil()
-
class
lvm_subplots
(vals, Model, data_visualize, latent_axes=None, sense_axes=None)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.lvm
latent_axes is a np array of dimension np.ceil(input_dim/2), one for each pair of the latent dimensions.
-
class
matplotlib_show
(vals, axes=None)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.data_show
the matplotlib_show class is a base class for all visualization methods that use matplotlib. It is initialized with an axis. If the axis is set to None it creates a figure window.
-
class
mocap_data_show
(vals, axes=None, connect=None, color='b')[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.matplotlib_show
Base class for visualizing motion capture data.
-
class
mocap_data_show_vpython
(vals, scene=None, connect=None, radius=0.1)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.vpython_show
Base class for visualizing motion capture data using visual module.
-
class
skeleton_show
(vals, skel, axes=None, padding=0, color='b')[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.mocap_data_show
data_show class for visualizing motion capture data encoded as a skeleton with angles.
data_show class for visualizing motion capture data encoded as a skeleton with angles. :param vals: set of modeled angles to use for printing in the axis when it’s first created. :type vals: np.array :param skel: skeleton object that has the parameters of the motion capture skeleton associated with it. :type skel: mocap.skeleton object :param padding: :type int
-
class
stick_show
(vals, connect=None, axes=None)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.mocap_data_show
Show a three dimensional point cloud as a figure. Connect elements of the figure together using the matrix connect.
-
class
vector_show
(vals, axes=None)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.matplotlib_show
A base visualization class that just shows a data vector as a plot of vector elements alongside their indices.
-
class
vpython_show
(vals, scene=None)[source]¶ Bases:
GPy.plotting.matplot_dep.visualize.data_show
the vpython_show class is a base class for all visualization methods that use vpython to display. It is initialized with a scene. If the scene is set to None it creates a scene window.
-
data_play
(Y, visualizer, frame_rate=30)[source]¶ Play a data set using the data_show object given.
Y: the data set to be visualized. Parameters: visualizer (data_show) – the data show objectwhether to display during optimisation Example usage:
This example loads in the CMU mocap database (http://mocap.cs.cmu.edu) subject number 35 motion number 01. It then plays it using the mocap_show visualize object.
data = GPy.util.datasets.cmu_mocap(subject='35', train_motions=['01']) Y = data['Y'] Y[:, 0:3] = 0. # Make figure walk in place visualize = GPy.util.visualize.skeleton_show(Y[0, :], data['skel']) GPy.util.visualize.data_play(Y, visualize)
Submodules¶
GPy.plotting.Tango module¶
GPy.plotting.abstract_plotting_library module¶
-
class
AbstractPlottingLibrary
[source]¶ Bases:
object
Set the defaults dictionary in the _defaults variable:
- E.g. for matplotlib we define a file defaults.py and
set the dictionary of it here:
from . import defaults _defaults = defaults.__dict__
-
add_to_canvas
(canvas, plots, legend=True, title=None, **kwargs)[source]¶ Add plots is a dictionary with the plots as the items or a list of plots as items to canvas.
The kwargs are plotting library specific kwargs!
E.g. in matplotlib this does not have to do anything to add stuff, but we set the legend and title.
!This function returns the updated canvas!
Parameters: - title – the title of the plot
- legend – whether to plot a legend or not
-
annotation_heatmap
(canvas, X, annotation, extent, label=None, **kwargs)[source]¶ Plot an annotation heatmap. That is like an imshow, but put the text of the annotation inside the cells of the heatmap (centered).
Parameters: - canvas – the canvas to plot on
- annotation (array-like) – the annotation labels for the heatmap
- extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
- label (str) – the label for the heatmap
Returns: a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)
-
annotation_heatmap_interact
(canvas, plot_function, extent, label=None, resolution=15, **kwargs)[source]¶ if plot_function is not None, return an interactive updated heatmap, which updates on axis events, so that one can zoom in and out and the heatmap gets updated. See the matplotlib implementation in matplot_dep.controllers.
the plot_function returns a pair (X, annotation) to plot, when called with a new input X (which would be the grid, which is visible on the plot right now)
Parameters: - canvas – the canvas to plot on
- annotation (array-like) – the annotation labels for the heatmap
- extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
- label (str) – the label for the heatmap
- plot_function – the function, which generates new data for given input locations X
- resolution (int) – the resolution of the interactive plot redraw - this is only needed when giving a plot_function
Returns: a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)
-
barplot
(canvas, x, height, width=0.8, bottom=0, color=None, label=None, **kwargs)[source]¶ Plot vertical bar plot centered at x with height and width of bars. The y level is at bottom.
the kwargs are plotting library specific kwargs!
Parameters: - x (array-like) – the center points of the bars
- height (array-like) – the height of the bars
- width (array-like) – the width of the bars
- bottom (array-like) – the start y level of the bars
- kwargs – kwargs for the specific library you are using.
-
contour
(canvas, X, Y, C, Z=None, color=None, label=None, **kwargs)[source]¶ Make a contour plot at (X, Y) with heights/colors stored in C on the canvas.
if Z is not None: make 3d contour plot at (X, Y, Z) with heights/colors stored in C on the canvas.
the kwargs are plotting library specific kwargs!
-
figure
(nrows, ncols, **kwargs)[source]¶ Get a new figure with nrows and ncolumns subplots. Does not initialize the canvases yet.
There is individual kwargs for the individual plotting libraries to use.
-
fill_between
(canvas, X, lower, upper, color=None, label=None, **kwargs)[source]¶ Fill along the xaxis between lower and upper.
the kwargs are plotting library specific kwargs!
-
fill_gradient
(canvas, X, percentiles, color=None, label=None, **kwargs)[source]¶ Plot a gradient (in alpha values) for the given percentiles.
the kwargs are plotting library specific kwargs!
-
imshow
(canvas, X, extent=None, label=None, vmin=None, vmax=None, **kwargs)[source]¶ Show the image stored in X on the canvas.
The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!
the kwargs are plotting library specific kwargs!
-
imshow_interact
(canvas, plot_function, extent=None, label=None, vmin=None, vmax=None, **kwargs)[source]¶ This function is optional!
Create an imshow controller to stream the image returned by the plot_function. There is an imshow controller written for mmatplotlib, which updates the imshow on changes in axis.
The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!
the kwargs are plotting library specific kwargs!
-
new_canvas
(figure=None, col=1, row=1, projection='2d', xlabel=None, ylabel=None, zlabel=None, title=None, xlim=None, ylim=None, zlim=None, **kwargs)[source]¶ Return a canvas, kwargupdate for your plotting library.
if figure is not None, create a canvas in the figure at subplot position (col, row).
This method does two things, it creates an empty canvas and updates the kwargs (deletes the unnecessary kwargs) for further usage in normal plotting.
the kwargs are plotting library specific kwargs!
Parameters: projection ({'2d'|'3d'}) – The projection to use. E.g. in matplotlib this means it deletes references to ax, as plotting is done on the axis itself and is not a kwarg.
Parameters: - xlabel – the label to put on the xaxis
- ylabel – the label to put on the yaxis
- zlabel – the label to put on the zaxis (if plotting in 3d)
- title – the title of the plot
- legend – if True, plot a legend, if int make legend rows in the legend
- float) xlim ((float,) – the limits for the xaxis
- float) ylim ((float,) – the limits for the yaxis
- float) zlim ((float,) – the limits for the zaxis (if plotting in 3d)
-
plot
(cavas, X, Y, Z=None, color=None, label=None, **kwargs)[source]¶ Make a line plot from for Y on X (Y = f(X)) on the canvas. If Z is not None, plot in 3d!
the kwargs are plotting library specific kwargs!
-
plot_axis_lines
(ax, X, color=None, label=None, **kwargs)[source]¶ Plot lines at the bottom (lower boundary of yaxis) of the axis at input location X.
If X is two dimensional, plot in 3d and connect the axis lines to the bottom of the Z axis.
the kwargs are plotting library specific kwargs!
-
scatter
(canvas, X, Y, Z=None, color=None, vmin=None, vmax=None, label=None, **kwargs)[source]¶ Make a scatter plot between X and Y on the canvas given.
the kwargs are plotting library specific kwargs!
Parameters: - canvas – the plotting librarys specific canvas to plot on.
- X (array-like) – the inputs to plot.
- Y (array-like) – the outputs to plot.
- Z (array-like) – the Z level to plot (if plotting 3d).
- c (array-like) – the colorlevel for each point.
- vmin (float) – minimum colorscale
- vmax (float) – maximum colorscale
- kwargs – the specific kwargs for your plotting library
-
surface
(canvas, X, Y, Z, color=None, label=None, **kwargs)[source]¶ Plot a surface for 3d plotting for the inputs (X, Y, Z).
the kwargs are plotting library specific kwargs!
-
xerrorbar
(canvas, X, Y, error, color=None, label=None, **kwargs)[source]¶ Make an errorbar along the xaxis for points at (X,Y) on the canvas. if error is two dimensional, the lower error is error[:,0] and the upper error is error[:,1]
the kwargs are plotting library specific kwargs!
-
yerrorbar
(canvas, X, Y, error, color=None, label=None, **kwargs)[source]¶ Make errorbars along the yaxis on the canvas given. if error is two dimensional, the lower error is error[0, :] and the upper error is error[1, :]
the kwargs are plotting library specific kwargs!
-
defaults
¶
GPy.inference.optimization package¶
Submodules¶
GPy.inference.optimization.stochastics module¶
-
class
SparseGPMissing
(model, batchsize=1)[source]¶ Bases:
GPy.inference.optimization.stochastics.StochasticStorage
Here we want to loop over all dimensions everytime. Thus, we can just make sure the loop goes over self.d every time. We will try to get batches which look the same together which speeds up calculations significantly.
-
class
SparseGPStochastics
(model, batchsize=1, missing_data=True)[source]¶ Bases:
GPy.inference.optimization.stochastics.StochasticStorage
For the sparse gp we need to store the dimension we are in, and the indices corresponding to those
-
class
StochasticStorage
(model)[source]¶ Bases:
object
This is a container for holding the stochastic parameters, such as subset indices or step length and so on.
self.d has to be a list of lists: [dimension indices, nan indices for those dimensions] so that the minibatches can be used as efficiently as possible.
Initialize this stochastic container using the given model
GPy.inference.latent_function_inference package¶
Introduction¶
Certain GPy.models
can be instanciated with an inference_method. This submodule contains objects that can be assigned to inference_method.
Inference over Gaussian process latent functions¶
In all our GP models, the consistency property means that we have a Gaussian prior over a finite set of points f. This prior is:
where \(K\) is the kernel matrix.
We also have a likelihood (see GPy.likelihoods
) which defines how the data are
related to the latent function: \(p(y | f)\). If the likelihood is also a Gaussian,
the inference over \(f\) is tractable (see GPy.inference.latent_function_inference.exact_gaussian_inference
).
If the likelihood object is something other than Gaussian, then exact inference
is not tractable. We then resort to a Laplace approximation (GPy.inference.latent_function_inference.laplace
) or
expectation propagation (GPy.inference.latent_function_inference.expectation_propagation
).
The inference methods return a
Posterior
instance, which is a simple
structure which contains a summary of the posterior. The model classes can then
use this posterior object for making predictions, optimizing hyper-parameters,
etc.
-
class
InferenceMethodList
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
,list
-
class
LatentFunctionInference
[source]¶ Bases:
object
-
static
from_dict
(input_dict)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
static
Submodules¶
GPy.inference.latent_function_inference.dtc module¶
-
class
DTC
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
An object for inference when the likelihood is Gaussian, but we want to do sparse inference.
The function self.inference returns a Posterior object, which summarizes the posterior.
NB. It’s not recommended to use this function! It’s here for historical purposes.
GPy.inference.latent_function_inference.exact_gaussian_inference module¶
-
class
ExactGaussianInference
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
An object for inference when the likelihood is Gaussian.
The function self.inference returns a Posterior object, which summarizes the posterior.
For efficiency, we sometimes work with the cholesky of Y*Y.T. To save repeatedly recomputing this, we cache it.
-
LOO
(kern, X, Y, likelihood, posterior, Y_metadata=None, K=None)[source]¶ Leave one out error as found in “Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models” Vehtari et al. 2014.
-
GPy.inference.latent_function_inference.exact_studentt_inference module¶
-
class
ExactStudentTInference
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
An object for inference of student-t processes (not for GP with student-t likelihood!).
The function self.inference returns a StudentTPosterior object, which summarizes the posterior.
GPy.inference.latent_function_inference.expectation_propagation module¶
-
class
EP
(epsilon=1e-06, eta=1.0, delta=1.0, always_reset=False, max_iters=inf, ep_mode='alternated', parallel_updates=False, loading=False)[source]¶ Bases:
GPy.inference.latent_function_inference.expectation_propagation.EPBase
,GPy.inference.latent_function_inference.exact_gaussian_inference.ExactGaussianInference
The expectation-propagation algorithm. For nomenclature see Rasmussen & Williams 2006.
Parameters: - epsilon (float) – Convergence criterion, maximum squared difference allowed between mean updates to stop iterations (float)
- eta (float64) – parameter for fractional EP updates.
- delta (float64) – damping EP updates factor.
- always_reset – setting to always reset the approximation at the beginning of every inference call.
Max_iters: int
Ep_mode: string. It can be “nested” (EP is run every time the Hyperparameters change) or “alternated” (It runs EP at the beginning and then optimize the Hyperparameters).
Parallel_updates: boolean. If true, updates of the parameters of the sites in parallel
Loading: boolean. If True, prevents the EP parameters to change. Hack used when loading a serialized model
-
class
EPBase
(epsilon=1e-06, eta=1.0, delta=1.0, always_reset=False, max_iters=inf, ep_mode='alternated', parallel_updates=False, loading=False)[source]¶ Bases:
object
The expectation-propagation algorithm. For nomenclature see Rasmussen & Williams 2006.
Parameters: - epsilon (float) – Convergence criterion, maximum squared difference allowed between mean updates to stop iterations (float)
- eta (float64) – parameter for fractional EP updates.
- delta (float64) – damping EP updates factor.
- always_reset – setting to always reset the approximation at the beginning of every inference call.
Max_iters: int
Ep_mode: string. It can be “nested” (EP is run every time the Hyperparameters change) or “alternated” (It runs EP at the beginning and then optimize the Hyperparameters).
Parallel_updates: boolean. If true, updates of the parameters of the sites in parallel
Loading: boolean. If True, prevents the EP parameters to change. Hack used when loading a serialized model
-
class
EPDTC
(epsilon=1e-06, eta=1.0, delta=1.0, always_reset=False, max_iters=inf, ep_mode='alternated', parallel_updates=False, loading=False)[source]¶ Bases:
GPy.inference.latent_function_inference.expectation_propagation.EPBase
,GPy.inference.latent_function_inference.var_dtc.VarDTC
The expectation-propagation algorithm. For nomenclature see Rasmussen & Williams 2006.
Parameters: - epsilon (float) – Convergence criterion, maximum squared difference allowed between mean updates to stop iterations (float)
- eta (float64) – parameter for fractional EP updates.
- delta (float64) – damping EP updates factor.
- always_reset – setting to always reset the approximation at the beginning of every inference call.
Max_iters: int
Ep_mode: string. It can be “nested” (EP is run every time the Hyperparameters change) or “alternated” (It runs EP at the beginning and then optimize the Hyperparameters).
Parallel_updates: boolean. If true, updates of the parameters of the sites in parallel
Loading: boolean. If True, prevents the EP parameters to change. Hack used when loading a serialized model
-
class
posteriorParams
(mu, Sigma, L=None)[source]¶ Bases:
GPy.inference.latent_function_inference.expectation_propagation.posteriorParamsBase
-
class
posteriorParamsDTC
(mu, Sigma_diag)[source]¶ Bases:
GPy.inference.latent_function_inference.expectation_propagation.posteriorParamsBase
GPy.inference.latent_function_inference.fitc module¶
-
class
FITC
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
An object for inference when the likelihood is Gaussian, but we want to do sparse inference.
The function self.inference returns a Posterior object, which summarizes the posterior.
-
const_jitter
= 1e-06¶
-
GPy.inference.latent_function_inference.gaussian_grid_inference module¶
-
class
GaussianGridInference
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
An object for inference when the likelihood is Gaussian and inputs are on a grid.
The function self.inference returns a GridPosterior object, which summarizes the posterior.
GPy.inference.latent_function_inference.grid_posterior module¶
-
class
GridPosterior
(alpha_kron=None, QTs=None, Qs=None, V_kron=None)[source]¶ Bases:
object
Specially intended for the Grid Regression case An object to represent a Gaussian posterior over latent function values, p(f|D).
The purpose of this class is to serve as an interface between the inference schemes and the model classes.
alpha_kron : QTs : transpose of eigen vectors resulting from decomposition of single dimension covariance matrices Qs : eigen vectors resulting from decomposition of single dimension covariance matrices V_kron : kronecker product of eigenvalues reulting decomposition of single dimension covariance matrices
-
QTs
¶ array of transposed eigenvectors resulting for single dimension covariance
-
Qs
¶ array of eigenvectors resulting for single dimension covariance
-
V_kron
¶ kronecker product of eigenvalues s
-
alpha
¶
-
GPy.inference.latent_function_inference.inferenceX module¶
-
class
InferenceX
(model, Y, name='inferenceX', init='L2')[source]¶ Bases:
GPy.core.model.Model
The model class for inference of new X with given new Y. (replacing the “do_test_latent” in Bayesian GPLVM) It is a tiny inference model created from the original GP model. The kernel, likelihood (only Gaussian is supported at the moment) and posterior distribution are taken from the original model. For Regression models and GPLVM, a point estimate of the latent variable X will be inferred. For Bayesian GPLVM, the variational posterior of X will be inferred. X is inferred through a gradient optimization of the inference model.
Parameters: - model (GPy.core.Model) – the GPy model used in inference
- Y (numpy.ndarray) – the new observed data for inference
- init ('L2', 'NCC' and 'rand') – the distance metric of Y for initializing X with the nearest neighbour.
-
infer_newX
(model, Y_new, optimize=True, init='L2')[source]¶ Infer the distribution of X for the new observed data Y_new.
Parameters: - model (GPy.core.Model) – the GPy model used in inference
- Y_new (numpy.ndarray) – the new observed data for inference
- optimize (boolean) – whether to optimize the location of new X (True by default)
Returns: a tuple containing the estimated posterior distribution of X and the model that optimize X
Return type: (GPy.core.parameterization.variational.VariationalPosterior, GPy.core.Model)
GPy.inference.latent_function_inference.laplace module¶
-
class
Laplace
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
Laplace Approximation
Find the moments hat{f} and the hessian at this point (using Newton-Raphson) of the unnormalised posterior
-
LOO
(kern, X, Y, likelihood, posterior, Y_metadata=None, K=None, f_hat=None, W=None, Ki_W_i=None)[source]¶ Leave one out log predictive density as found in “Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models” Vehtari et al. 2014.
-
inference
(kern, X, likelihood, Y, mean_function=None, Y_metadata=None)[source]¶ Returns a Posterior class containing essential quantities of the posterior
-
mode_computations
(f_hat, Ki_f, K, Y, likelihood, kern, Y_metadata)[source]¶ At the mode, compute the hessian and effective covariance matrix.
- returns: logZ : approximation to the marginal likelihood
- woodbury_inv : variable required for calculating the approximation to the covariance matrix dL_dthetaL : array of derivatives (1 x num_kernel_params) dL_dthetaL : array of derivatives (1 x num_likelihood_params)
-
rasm_mode
(K, Y, likelihood, Ki_f_init, Y_metadata=None, *args, **kwargs)[source]¶ Rasmussen’s numerically stable mode finding For nomenclature see Rasmussen & Williams 2006 Influenced by GPML (BSD) code, all errors are our own
Parameters: - K (NxD matrix) – Covariance matrix evaluated at locations X
- Y (np.ndarray) – The data
- likelihood (a GPy.likelihood object) – the likelihood of the latent function value for the given data
- Ki_f_init (np.ndarray) – the initial guess at the mode
- Y_metadata (np.ndarray | None) – information about the data, e.g. which likelihood to take from a multi-likelihood object
Returns: f_hat, mode on which to make laplace approxmiation
Return type: np.ndarray
-
-
class
LaplaceBlock
[source]¶ Bases:
GPy.inference.latent_function_inference.laplace.Laplace
Laplace Approximation
Find the moments hat{f} and the hessian at this point (using Newton-Raphson) of the unnormalised posterior
-
mode_computations
(f_hat, Ki_f, K, Y, likelihood, kern, Y_metadata)[source]¶ At the mode, compute the hessian and effective covariance matrix.
- returns: logZ : approximation to the marginal likelihood
- woodbury_inv : variable required for calculating the approximation to the covariance matrix dL_dthetaL : array of derivatives (1 x num_kernel_params) dL_dthetaL : array of derivatives (1 x num_likelihood_params)
-
rasm_mode
(K, Y, likelihood, Ki_f_init, Y_metadata=None, *args, **kwargs)[source]¶ Rasmussen’s numerically stable mode finding For nomenclature see Rasmussen & Williams 2006 Influenced by GPML (BSD) code, all errors are our own
Parameters: - K (NxD matrix) – Covariance matrix evaluated at locations X
- Y (np.ndarray) – The data
- likelihood (a GPy.likelihood object) – the likelihood of the latent function value for the given data
- Ki_f_init (np.ndarray) – the initial guess at the mode
- Y_metadata (np.ndarray | None) – information about the data, e.g. which likelihood to take from a multi-likelihood object
Returns: f_hat, mode on which to make laplace approxmiation
Return type: np.ndarray
-
GPy.inference.latent_function_inference.pep module¶
-
class
PEP
(alpha)[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
Sparse Gaussian processes using Power-Expectation Propagation for regression: alpha pprox 0 gives VarDTC and alpha = 1 gives FITC
Reference: A Unifying Framework for Sparse Gaussian Process Approximation using Power Expectation Propagation, https://arxiv.org/abs/1605.07066
-
const_jitter
= 1e-06¶
-
GPy.inference.latent_function_inference.posterior module¶
-
class
Posterior
(woodbury_chol=None, woodbury_vector=None, K=None, mean=None, cov=None, K_chol=None, woodbury_inv=None, prior_mean=0)[source]¶ Bases:
object
An object to represent a Gaussian posterior over latent function values, p(f|D). This may be computed exactly for Gaussian likelihoods, or approximated for non-Gaussian likelihoods.
The purpose of this class is to serve as an interface between the inference schemes and the model classes. the model class can make predictions for the function at any new point x_* by integrating over this posterior.
woodbury_chol : a lower triangular matrix L that satisfies posterior_covariance = K - K L^{-T} L^{-1} K woodbury_vector : a matrix (or vector, as Nx1 matrix) M which satisfies posterior_mean = K M K : the proir covariance (required for lazy computation of various quantities) mean : the posterior mean cov : the posterior covariance
Not all of the above need to be supplied! You must supply:
K (for lazy computation) or K_chol (for lazy computation)You may supply either:
woodbury_chol woodbury_vectorOr:
mean covOf course, you can supply more than that, but this class will lazily compute all other quantites on demand.
-
covariance_between_points
(kern, X, X1, X2)[source]¶ Computes the posterior covariance between points.
Parameters: - kern – GP kernel
- X – current input observations
- X1 – some input observations
- X2 – other input observations
-
K_chol
¶ Cholesky of the prior covariance K
-
covariance
¶ Posterior covariance $$ K_{xx} - K_{xx}W_{xx}^{-1}K_{xx} W_{xx} := exttt{Woodbury inv} $$
-
mean
¶ Posterior mean $$ K_{xx}v v := exttt{Woodbury vector} $$
-
precision
¶ Inverse of posterior covariance
-
woodbury_chol
¶ return $L_{W}$ where L is the lower triangular Cholesky decomposition of the Woodbury matrix $$ L_{W}L_{W}^{ op} = W^{-1} W^{-1} := exttt{Woodbury inv} $$
-
woodbury_inv
¶ The inverse of the woodbury matrix, in the gaussian likelihood case it is defined as $$ (K_{xx} + Sigma_{xx})^{-1} Sigma_{xx} := exttt{Likelihood.variance / Approximate likelihood covariance} $$
-
woodbury_vector
¶ Woodbury vector in the gaussian likelihood case only is defined as $$ (K_{xx} + Sigma)^{-1}Y Sigma := exttt{Likelihood.variance / Approximate likelihood covariance} $$
-
-
class
PosteriorEP
(woodbury_chol=None, woodbury_vector=None, K=None, mean=None, cov=None, K_chol=None, woodbury_inv=None, prior_mean=0)[source]¶ Bases:
GPy.inference.latent_function_inference.posterior.Posterior
woodbury_chol : a lower triangular matrix L that satisfies posterior_covariance = K - K L^{-T} L^{-1} K woodbury_vector : a matrix (or vector, as Nx1 matrix) M which satisfies posterior_mean = K M K : the proir covariance (required for lazy computation of various quantities) mean : the posterior mean cov : the posterior covariance
Not all of the above need to be supplied! You must supply:
K (for lazy computation) or K_chol (for lazy computation)You may supply either:
woodbury_chol woodbury_vectorOr:
mean covOf course, you can supply more than that, but this class will lazily compute all other quantites on demand.
-
class
PosteriorExact
(woodbury_chol=None, woodbury_vector=None, K=None, mean=None, cov=None, K_chol=None, woodbury_inv=None, prior_mean=0)[source]¶ Bases:
GPy.inference.latent_function_inference.posterior.Posterior
woodbury_chol : a lower triangular matrix L that satisfies posterior_covariance = K - K L^{-T} L^{-1} K woodbury_vector : a matrix (or vector, as Nx1 matrix) M which satisfies posterior_mean = K M K : the proir covariance (required for lazy computation of various quantities) mean : the posterior mean cov : the posterior covariance
Not all of the above need to be supplied! You must supply:
K (for lazy computation) or K_chol (for lazy computation)You may supply either:
woodbury_chol woodbury_vectorOr:
mean covOf course, you can supply more than that, but this class will lazily compute all other quantites on demand.
-
class
StudentTPosterior
(deg_free, **kwargs)[source]¶ Bases:
GPy.inference.latent_function_inference.posterior.PosteriorExact
GPy.inference.latent_function_inference.svgp module¶
-
class
SVGP
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
GPy.inference.latent_function_inference.var_dtc module¶
-
class
VarDTC
(limit=1)[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
An object for inference when the likelihood is Gaussian, but we want to do sparse inference.
The function self.inference returns a Posterior object, which summarizes the posterior.
For efficiency, we sometimes work with the cholesky of Y*Y.T. To save repeatedly recomputing this, we cache it.
-
inference
(kern, X, Z, likelihood, Y, Y_metadata=None, mean_function=None, precision=None, Lm=None, dL_dKmm=None, psi0=None, psi1=None, psi2=None, Z_tilde=None)[source]¶
-
const_jitter
= 1e-08¶
-
GPy.inference.latent_function_inference.var_dtc_parallel module¶
-
class
VarDTC_minibatch
(batchsize=None, limit=3, mpi_comm=None)[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
An object for inference when the likelihood is Gaussian, but we want to do sparse inference.
The function self.inference returns a Posterior object, which summarizes the posterior.
For efficiency, we sometimes work with the cholesky of Y*Y.T. To save repeatedly recomputing this, we cache it.
-
inference_likelihood
(kern, X, Z, likelihood, Y)[source]¶ The first phase of inference: Compute: log-likelihood, dL_dKmm
Cached intermediate results: Kmm, KmmInv,
-
inference_minibatch
(kern, X, Z, likelihood, Y)[source]¶ The second phase of inference: Computing the derivatives over a minibatch of Y Compute: dL_dpsi0, dL_dpsi1, dL_dpsi2, dL_dthetaL return a flag showing whether it reached the end of Y (isEnd)
-
const_jitter
= 1e-08¶
-
GPy.inference.latent_function_inference.var_gauss module¶
-
class
VarGauss
(alpha, beta)[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
The Variational Gaussian Approximation revisited
- @article{Opper:2009,
- title = {The Variational Gaussian Approximation Revisited}, author = {Opper, Manfred and Archambeau, C{‘e}dric}, journal = {Neural Comput.}, year = {2009}, pages = {786–792},
}
Parameters: - alpha – GPy.core.Param varational parameter
- beta – GPy.core.Param varational parameter
GPy.inference.latent_function_inference.vardtc_md module¶
-
class
VarDTC_MD
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
The VarDTC inference method for sparse GP with missing data (GPy.models.SparseGPRegressionMD)
-
inference
(kern, X, Z, likelihood, Y, indexD, output_dim, Y_metadata=None, Lm=None, dL_dKmm=None, Kuu_sigma=None)[source]¶ The first phase of inference: Compute: log-likelihood, dL_dKmm
Cached intermediate results: Kmm, KmmInv,
-
const_jitter
= 1e-06¶
-
GPy.inference.latent_function_inference.vardtc_svi_multiout module¶
-
class
PosteriorMultioutput
(LcInvMLrInvT, LcInvScLcInvT, LrInvSrLrInvT, Lr, Lc, kern_r, Xr, Zr)[source]¶ Bases:
object
-
class
VarDTC_SVI_Multiout
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
The VarDTC inference method for Multi-output GP regression (GPy.models.GPMultioutRegression)
-
inference
(kern_r, kern_c, Xr, Xc, Zr, Zc, likelihood, Y, qU_mean, qU_var_r, qU_var_c)[source]¶ The SVI-VarDTC inference
-
const_jitter
= 1e-06¶
-
GPy.inference.latent_function_inference.vardtc_svi_multiout_miss module¶
-
class
VarDTC_SVI_Multiout_Miss
[source]¶ Bases:
GPy.inference.latent_function_inference.LatentFunctionInference
The VarDTC inference method for Multi-output GP regression with missing data (GPy.models.GPMultioutRegressionMD)
-
inference
(kern_r, kern_c, Xr, Xc, Zr, Zc, likelihood, Y, qU_mean, qU_var_r, qU_var_c, indexD, output_dim)[source]¶ The SVI-VarDTC inference
-
inference_d
(d, beta, Y, indexD, grad_dict, mid_res, uncertain_inputs_r, uncertain_inputs_c, Mr, Mc)[source]¶
-
const_jitter
= 1e-06¶
-
GPy.inference.mcmc package¶
Submodules¶
GPy.inference.mcmc.hmc module¶
-
class
HMC
(model, M=None, stepsize=0.1)[source]¶ Bases:
object
An implementation of Hybrid Monte Carlo (HMC) for GPy models
Initialize an object for HMC sampling. Note that the status of the model (model parameters) will be changed during sampling.
Parameters: - model (GPy.core.Model) – the GPy model that will be sampled
- M (numpy.ndarray) – the mass matrix (an identity matrix by default)
- stepsize (float) – the step size for HMC sampling
-
sample
(num_samples=1000, hmc_iters=20)[source]¶ Sample the (unfixed) model parameters.
Parameters: - num_samples (int) – the number of samples to draw (1000 by default)
- hmc_iters (int) – the number of leap-frog iterations (20 by default)
Returns: the list of parameters samples with the size N x P (N - the number of samples, P - the number of parameters to sample)
Return type: numpy.ndarray