GPy - A Gaussian Process (GP) framework in Python

Introduction

GPy is a Gaussian Process (GP) framework written in Python, from the Sheffield machine learning group. It includes support for basic GP regression, multiple output GPs (using coregionalization), various noise models, sparse GPs, non-parametric regression and latent variables.

The GPy homepage contains tutorials for users and further information on the project, including installation instructions.

The documentation hosted here is mostly aimed at developers interacting closely with the code-base.

Source Code

The code can be found on our Github project page. It is open source and provided under the BSD license.

Installation

Installation instructions can currently be found on our Github project page.

Tutorials

Several tutorials have been developed in the form of Jupyter Notebooks.

Architecture

GPy is a big, powerful package, with many features. The concept of how to use GPy in general terms is roughly as follows. A model (GPy.models) is created - this is at the heart of GPy from a user perspective. A kernel (GPy.kern), data and, usually, a representation of noise are assigned to the model. Specific models require, or can make use of, additional information. The kernel and noise are controlled by hyperparameters - calling the optimize (GPy.core.gp.GP.optimize) method against the model invokes an iterative process which seeks optimal hyperparameter values. The model object can be used to make plots and predictions (GPy.core.gp.GP.predict).

digraph GPy_Arch {

   rankdir=LR
   node[shape="rectangle" style="rounded,filled" fontname="Arial"]
   edge [color="#006699" len=2.5]

   Data->Model
   Hyperparameters->Kernel
   Hyperparameters->Noise
   Kernel->Model
   Noise->Model

   Model->Optimize
   Optimize->Hyperparameters

   Model->Predict
   Model->Plot

   Optimize [shape="ellipse"]
   Predict [shape="ellipse"]
   Plot [shape="ellipse"]

   subgraph cluster_0 {
      Data
      Kernel
      Noise
   }

}

Creating new Models

In GPy all models inherit from the base class Parameterized. Parameterized is a class which allows for parameterization of objects. All it holds is functionality for tying, bounding and fixing of parameters. It also provides the functionality of searching and manipulating parameters by regular expression syntax. See Parameterized for more information.

The Model class provides parameter introspection, objective function and optimization.

In order to fully use all functionality of Model some methods need to be implemented / overridden. And the model needs to be told its parameters, such that it can provide optimized parameter distribution and handling. In order to explain the functionality of those methods we will use a wrapper to the numpy rosen function, which holds input parameters \(\mathbf{X}\). Where \(\mathbf{X}\in\mathbb{R}^{N\times 1}\).

Obligatory methods

__init__ :

Initialize the model with the given parameters. These need to be added to the model by calling self.add_parameter(<param>), where param needs to be a parameter handle (See parameterized_ for details).:

self.X = GPy.Param("input", X)
self.add_parameter(self.X)
log_likelihood :

Returns the log-likelihood of the new model. For our example this is just the call to rosen and as we want to minimize it, we need to negate the objective.:

return -scipy.optimize.rosen(self.X)
parameters_changed :

Updates the internal state of the model and sets the gradient of each parameter handle in the hierarchy with respect to the log_likelihod. Thus here we need to set the negative derivative of the rosenbrock function for the parameters. In this case it is the gradient for self.X.:

self.X.gradient = -scipy.optimize.rosen_der(self.X)

Here the full code for the Rosen class:

from GPy import Model, Param
import scipy
class Rosen(Model):
    def __init__(self, X, name='rosenbrock'):
        super(Rosen, self).__init__(name=name)
        self.X = Param("input", X)
        self.add_parameter(self.X)
    def log_likelihood(self):
        return -scipy.optimize.rosen(self.X)
    def parameters_changed(self):
        self.X.gradient = -scipy.optimize.rosen_der(self.X)

In order to test the newly created model, we can check the gradients and optimize a standard rosenbrock run:

>>> m = Rosen(np.array([-1,-1]))
>>> print m
Name                 : rosenbrock
Log-likelihood       : -404.0
Number of Parameters : 2
Parameters:
  rosenbrock.  |  Value  |  Constraint  |  Prior  |  Tied to
  input        |   (2,)  |              |         |
>>> m.checkgrad(verbose=True)
           Name           |     Ratio     |  Difference   |  Analytical   |   Numerical
------------------------------------------------------------------------------------------
 rosenbrock.input[[0]]    |   1.000000    |   0.000000    |  -804.000000  |  -804.000000
 rosenbrock.input[[1]]    |   1.000000    |   0.000000    |  -400.000000  |  -400.000000
>>> m.optimize()
>>> print m
Name                 : rosenbrock
Log-likelihood       : -6.52150088871e-15
Number of Parameters : 2
Parameters:
  rosenbrock.  |  Value  |  Constraint  |  Prior  |  Tied to
  input        |   (2,)  |              |         |
>>> print m.input
  Index  |  rosenbrock.input  |  Constraint  |   Prior   |  Tied to
   [0]   |        0.99999994  |              |           |    N/A
   [1]   |        0.99999987  |              |           |    N/A
>>> print m.gradient
[ -1.91169809e-06,   1.01852309e-06]

This is the optimium for the 2D Rosenbrock function, as expected, and the gradient of the inputs are almost zero.

Optional methods

Currently none.

Creating new kernels

We will see in this tutorial how to create new kernels in GPy. We will also give details on how to implement each function of the kernel and illustrate with a running example: the rational quadratic kernel.

Structure of a kernel in GPy

In GPy a kernel object is made of a list of kernpart objects, which correspond to symmetric positive definite functions. More precisely, the kernel should be understood as the sum of the kernparts. In order to implement a new covariance, the following steps must be followed

  1. implement the new covariance as a GPy.kern.src.kern.Kern object
  2. update the GPy.kern.src file

Theses three steps are detailed below.

Implementing a Kern object

We advise the reader to start with copy-pasting an existing kernel and to modify the new file. We will now give a description of the various functions that can be found in a Kern object, some of which are mandatory for the new kernel to work.

GPy.kern.src.kern.Kern.__init__ (self, input_dim, param1, param2, *args)

The implementation of this function in mandatory.

For all Kerns the first parameter input_dim corresponds to the dimension of the input space, and the following parameters stand for the parameterization of the kernel.

You have to call super(<class_name>, self).__init__(input_dim, active_dims, name) to make sure the input dimension (and possible dimension restrictions using active_dims) and name of the kernel are stored in the right place. These attributes are available as self.input_dim and self.name at runtime. Parameterization is done by adding Param objects to self and use them as normal numpy array-like s in your code. The parameters have to be added by calling link_parameters (*parameters) with the Param objects as arguments:

from .core.parameterization import Param

def __init__(self,input_dim,variance=1.,lengthscale=1.,power=1.,active_dims=None):
    super(RationalQuadratic, self).__init__(input_dim, active_dims, 'rat_quad')
    assert input_dim == 1, "For this kernel we assume input_dim=1"
    self.variance = Param('variance', variance)
    self.lengthscale = Param('lengtscale', lengthscale)
    self.power = Param('power', power)
    self.link_parameters(self.variance, self.lengthscale, self.power)

From now on you can use the parameters self.variance, self.lengthscale, self.power as normal numpy array-like s in your code. Updates from the optimization routine will be done automatically.

parameters_changed (self)

The implementation of this function is optional.

This functions is called as a callback upon each successful change to the parameters. If one optimization step was successfull and the parameters (linked by link_parameters (*parameters)) are changed, this callback function will be called. This callback may be used to update precomputations for the kernel. Do not implement the gradient updates here, as gradient updates are performed by the model enclosing the kernel. In this example, we issue a no-op:

def parameters_changed(self):
    # nothing todo here
    pass
K (self,X,X2)

The implementation of this function in mandatory.

This function is used to compute the covariance matrix associated with the inputs X, X2 (np.arrays with arbitrary number of lines, \(n_1\), \(n_2\), corresponding to the number of samples over which to calculate covariance) and self.input_dim columns.

def K(self,X,X2):
    if X2 is None: X2 = X
    dist2 = np.square((X-X2.T)/self.lengthscale)
    return self.variance*(1 + dist2/2.)**(-self.power)
Kdiag (self,X)

The implementation of this function is mandatory.

This function is similar to K but it computes only the values of the kernel on the diagonal. Thus, target is a 1-dimensional np.array of length \(n \times 1\).

def Kdiag(self,X):
    return self.variance*np.ones(X.shape[0])
update_gradients_full (self, dL_dK, X, X2=None)

This function is required for the optimization of the parameters.

Computes the gradients and sets them on the parameters of this model. For example, if the kernel is parameterized by \(\sigma^2, \theta\), then

\[\frac{\partial L}{\partial\sigma^2} = \frac{\partial L}{\partial K} \frac{\partial K}{\partial\sigma^2}\]

is added to the gradient of \(\sigma^2\): self.variance.gradient = <gradient> and

\[\frac{\partial L}{\partial\theta} = \frac{\partial L}{\partial K} \frac{\partial K}{\partial\theta}\]

to \(\theta\).

def update_gradients_full(self, dL_dK, X, X2):
    if X2 is None: X2 = X
    dist2 = np.square((X-X2.T)/self.lengthscale)

    dvar = (1 + dist2/2.)**(-self.power)
    dl = self.power * self.variance * dist2 * self.lengthscale**(-3) * (1 + dist2/2./self.power)**(-self.power-1)
    dp = - self.variance * np.log(1 + dist2/2.) * (1 + dist2/2.)**(-self.power)

    self.variance.gradient = np.sum(dvar*dL_dK)
    self.lengthscale.gradient = np.sum(dl*dL_dK)
    self.power.gradient = np.sum(dp*dL_dK)
update_gradients_diag (self,dL_dKdiag,X,target)

This function is required for BGPLVM, sparse models and uncertain inputs.

As previously, target is an self.num_params array and

\[\frac{\partial L}{\partial Kdiag} \frac{\partial Kdiag}{\partial param}\]

is set to each param.

def update_gradients_diag(self, dL_dKdiag, X):
    self.variance.gradient = np.sum(dL_dKdiag)
    # here self.lengthscale and self.power have no influence on Kdiag so target[1:] are unchanged
gradients_X (self,dL_dK, X, X2)

This function is required for GPLVM, BGPLVM, sparse models and uncertain inputs.

Computes the derivative of the likelihood with respect to the inputs X (a \(n \times q\) np.array), that is, it calculates the quantity:

\[\frac{\partial L}{\partial K} \frac{\partial K}{\partial X}\]

The partial derivative matrix is, in this case, comes out as an \(n \times q\) np.array.

def gradients_X(self,dL_dK,X,X2):
    """derivative of the likelihood with respect to X, calculated using dL_dK*dK_dX"""
    if X2 is None: X2 = X
    dist2 = np.square((X-X2.T)/self.lengthscale)

    dK_dX = -self.variance*self.power * (X-X2.T)/self.lengthscale**2 *  (1 + dist2/2./self.lengthscale)**(-self.power-1)
    return np.sum(dL_dK*dK_dX,1)[:,None]

Were the number of parameters to be larger than 1 or the number of dimensions likewise any larger than 1, the calculated partial derivitive would be a 3- or 4-tensor.

gradients_X_diag (self,dL_dKdiag,X)

This function is required for BGPLVM, sparse models and uncertain inputs. As for dKdiag_dtheta,

\[\frac{\partial L}{\partial Kdiag} \frac{\partial Kdiag}{\partial X}\]

is added to each element of target.

def gradients_X_diag(self,dL_dKdiag,X):
    # no diagonal gradients
    pass
Second order derivatives

These functions are required for the magnification factor and are the same as the first order gradients for X, but as the second order derivatives:

\[\frac{\partial^2 K}{\partial X\partial X2}\]
  • GPy.kern.src.kern.gradients_XX (self,dL_dK, X, X2)
  • GPy.kern.src.kern.gradients_XX_diag (self,dL_dKdiag, X)
Psi statistics

The psi statistics and their derivatives are required for BGPLVM and GPS with uncertain inputs only, the expressions are as follows

  • psi0(self, Z, variational_posterior)
    \[\psi_0 = \sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]\]
  • psi1(self, Z, variational_posterior)::
    \[\psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]\]
  • psi2(self, Z, variational_posterior)
    \[\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]\]
  • psi2n(self, Z, variational_posterior)
    \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Defining a new plotting function in GPy

GPy has a wrapper for different plotting backends. There are some functions you can use for standard plotting. Anything going beyond the scope of the AbstractPlottingLibrary classes plot definitions should be considered carefully and maybe is a special case for your plotting library only.

All plotting related code lives in GPy.plotting and beneath. No plotting related code needs to be anywhere else in GPy.

As examples are always the easiest way to learn how to, we will implement an example of a plotting function, which plots the covariance of a kernel.

Write your plotting function into a module under GPy.plotting.gpy_plot .<module_name> using the plotting routines provided in GPy.plotting.plotting_library. I like to from . import plotting_library as pl and the allways use pl(). to access functionality of the plotting library.

For the covariance plot we define the function in GPy.plotting.kernel_plots.

The first thing is to define the function parameters and write the documentation for them! The first argument of the plotting function is always self for the class this plotting function will be attached to (we will get to attaching the function to a class that in detail later on):

def plot_covariance(kernel, x=None, label=None,
             plot_limits=None, visible_dims=None, resolution=None,
             projection=None, levels=20, **kwargs):
    """
    Plot a kernel covariance w.r.t. another x.

    :param array-like x: the value to use for the other kernel argument (kernels are a function of two variables!)
    :param plot_limits: the range over which to plot the kernel
    :type plot_limits: Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D
    :param array-like visible_dims: input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
    :resolution: the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.
    :param {2d|3d} projection: What projection shall we use to plot the kernel?
    :param int levels: for 2D projection, how many levels for the contour plot to use?
    :param kwargs:  valid kwargs for your specific plotting library
    """

Having defined the outline of the function we can start implementing the real plotting.

First, we will write the necessary logic behind getting the covariance function. This involves getting an Xgrid to plot with and the second x to compare the covariance to:

from .plot_util import helper_for_plot_data
X = np.ones((2, kernel.input_dim)) * [-4, 4]
_, free_dims, Xgrid, xx, yy, _, _, resolution = helper_for_plot_data(kernel, X, plot_limits, visible_dims, None, resolution)
from numbers import Number
if x is None:
    x = np.zeros((1, kernel.input_dim))
elif isinstance(x, Number):
    x = np.ones((1, kernel.input_dim))*x
K = kernel.K(Xgrid, x)

free_dims holds the free dimensions after selecting from the visible_dims, Xgrid is the grid for the covariance, xx, yy are the grid positions for 2D plotting and x is the X2 for the kernel and K holds the kernel covariance for all positions between Xgrid and x.

Then we need a canvas to plot on. Always push the keyword arguments of the specifig library through GPy.plotting.abstract_plotting_library.AbstractPlottingLibrary.new_canvas:

if projection == '3d':
    zlabel = "k(X, {!s})" % (np.asanyarray(x).tolist())
    xlabel = 'X[:,0]'
    ylabel = 'X[:,1]'
else:
    xlabel = 'X'
    ylabel = "k(X, {!s})" % (np.asanyarray(x).tolist())

canvas, kwargs = pl().new_canvas(projection=projection, xlabel=xlabel, ylabel=ylabel, zlabel=zlabel, **kwargs)

Also very important is to use the defaults, which are defined for all plotting libraries implemented. This is done by updating the kwargs from the defaults. There is a helper function which takes care for existing keyword arguments. In this case we will just use the default for plotting a mean function for the covariance plot as well. If you want to define your own defaults add them to the defaults for each library and add it in here. See for example the defaults for matplotlib in GPy.plotting.matplot_dep.defaults. There is also the default for the meanplot_1d, which we are for the 1d plot:

from .plot_util import update_not_existing_kwargs
update_not_existing_kwargs(kwargs, pl().defaults.meanplot_1d)  # @UndefinedVariable

The full definition of the plotting then looks like this:

if len(free_dims)<=2:
    if len(free_dims)==1:
        # 1D plotting:
        update_not_existing_kwargs(kwargs, pl().defaults.meanplot_1d)  # @UndefinedVariable
        plots = dict(covariance=[pl().plot(canvas, Xgrid[:, free_dims], K, label=label, **kwargs)])
    else:
        if projection == '2d':
            update_not_existing_kwargs(kwargs, pl().defaults.meanplot_2d)  # @UndefinedVariable
            plots = dict(covariance=[pl().contour(canvas, xx[:, 0], yy[0, :],
                                           K.reshape(resolution, resolution),
                                           levels=levels, label=label, **kwargs)])
        elif projection == '3d':
            update_not_existing_kwargs(kwargs, pl().defaults.meanplot_3d)  # @UndefinedVariable
            plots = dict(covariance=[pl().surface(canvas, xx, yy,
                                           K.reshape(resolution, resolution),
                                           label=label,
                                           **kwargs)])
    return pl().add_to_canvas(canvas, plots)

else:
    raise NotImplementedError("Cannot plot a kernel with more than two input dimensions")

Where we return whatever is returned by GPy.plotting.abstract_plotting_library.AbstractPlottingLibrary.add_to_canvas, so that the plotting library can choose what to do with the plot later, when we want to show it. In order to show a plot, we can just call GPy.plotting.show with the output of the plot above.

Now we want to add the plot to the GPy.kern.src.kern.Kern. In order to do that, we inject the plotting function into the class in the GPy.plotting.__init__, which will make sure that the on the fly change of the backend works smoothly. Thus, in GPy.plotting.__init__ we add the line:

from ..kern import Kern
Kern.plot_covariance = gpy_plot.kernel_plots.plot_covariance

And that’s it. The plot can be shown in plotly by calling:

GPy.plotting.change_plotting_library('plotly')

k = GPy.kern.RBF(1) + GPy.kern.Matern32(1)
k.randomize()
fig = k.plot()
GPy.plotting.show(fig, <plot_library specific **kwargs>)

k = GPy.kern.RBF(2) + GPy.kern.Matern32(2)
k.randomize()
fig = k.plot()
GPy.plotting.show(fig, <plot_library specific **kwargs>)

k = GPy.kern.RBF(1) + GPy.kern.Matern32(2)
k.randomize()
fig = k.plot(projection='3d')
GPy.plotting.show(fig, <plot_library specific **kwargs>)

This explains the next thing. Changing the backend works on-the-fly. To show the above example in matplotlib, we just exchange the first line by GPy.plotting.change_plotting_library('matplotlib').

Parameterization handling

Parameterization in GPy is done through so called parameter handles. The parameter handles are handles to parameters of a model of any kind. A parameter handle can be constrained, fixed, randomized and others. All parameters in GPy have a name, with which they can be accessed in the model. The most common way of accesssing a parameter programmatically though, is by variable name.

Parameter handles

A parameter handle in GPy is a handle on a parameter, as the name suggests. A parameter can be constrained, fixed, randomized and more (See e.g. working with models). This gives the freedom to the model to handle parameter distribution and model updates as efficiently as possible. All parameter handles share a common memory space, which is just a flat numpy array, stored in the highest parent of a model hierarchy. In the following we will introduce and elucidate the different parameter handles which exist in GPy.

Parameterized

A parameterized object itself holds parameter handles and is just a summarization of the parameters below. It can use those parameters to change the internal state of the model and GPy ensures those parameters to allways hold the right value when in an optimization routine or any other update.

Param

The lowest level of parameter is a numpy array. This Param class inherits all functionality of a numpy array and can simply be used as if it where a numpy array. These parameters can be accessed in the same way as a numpy array is indexed.

GPy.core package

Introduction

This module contains the fundamental classes of GPy - classes that are inherited by objects in other parts of GPy in order to provide a consistent interface to major functionality.

Inheritance diagram of GPy.core.gp.GP

GPy.core.model is inherited by GPy.core.gp.GP. And GPy.core.model itself inherits paramz.model.Model from the paramz package. paramz essentially provides an inherited set of properties and functions used to manage state (and state changes) of the model.

GPy.core.gp.GP represents a GP model. Such an entity is typically passed variables representing known (x) and observed (y) data, along with a kernel and other information needed to create the specific model. It exposes functions which return information derived from the inputs to the model, for example predicting unobserved variables based on new known variables, or the log marginal likelihood of the current state of the model.

optimize is called to optimize hyperparameters of the model. The optimizer argument takes a string which is used to specify non-default optimization schemes.

Various plotting functions can be called against GPy.core.gp.GP.

Inheritance diagram of GPy.core.gp_grid.GpGrid, GPy.core.sparse_gp.SparseGP, GPy.core.sparse_gp_mpi.SparseGP_MPI, GPy.core.svgp.SVGP

GPy.core.gp.GP is used as the basis for classes supporting more specialized types of Gaussian Process model. These are however generally still not specific enough to be called by the user and are inhereted by members of the GPy.models package.

randomize(self, rand_gen=None, *args, **kwargs)[source]

Randomize the model. Make this draw from the prior if one exists, else draw from given random generator

Parameters:
  • rand_gen – np random number generator which takes args and kwargs
  • loc (flaot) – loc parameter for random number generator
  • scale (float) – scale parameter for random number generator
  • kwargs (args,) – will be passed through to random number generator

Subpackages

GPy.core.parameterization package
Introduction

Extends the functionality of the paramz package (dependency) to support paramterization of priors (GPy.core.parameterization.priors).

Inheritance diagram of GPy.core.parameterization.priors
Submodules
GPy.core.parameterization.param module
class Param(name, input_array, default_constraint=None, *a, **kw)[source]

Bases: paramz.param.Param, GPy.core.parameterization.priorizable.Priorizable

randomize(rand_gen=None, *args, **kwargs)

Randomize the model. Make this draw from the prior if one exists, else draw from given random generator

Parameters:
  • rand_gen – np random number generator which takes args and kwargs
  • loc (flaot) – loc parameter for random number generator
  • scale (float) – scale parameter for random number generator
  • kwargs (args,) – will be passed through to random number generator
GPy.core.parameterization.parameterized module
class Parameterized(name=None, parameters=[])[source]

Bases: paramz.parameterized.Parameterized, GPy.core.parameterization.priorizable.Priorizable

Parameterized class

Say m is a handle to a parameterized class.

Printing parameters:

  • print m: prints a nice summary over all parameters
  • print m.name: prints details for param with name ‘name’
  • print m[regexp]: prints details for all the parameters
    which match (!) regexp
  • print m[‘’]: prints details for all parameters

Fields:

Name: The name of the param, can be renamed! Value: Shape or value, if one-valued Constrain: constraint of the param, curly “{c}” brackets indicate

some parameters are constrained by c. See detailed print to get exact constraints.

Tied_to: which paramter it is tied to.

Getting and setting parameters:

Set all values in param to one:

m.name.to.param = 1

Handling of constraining, fixing and tieing parameters:

You can constrain parameters by calling the constrain on the param itself, e.g:

  • m.name[:,1].constrain_positive()
  • m.name[0].tie_to(m.name[1])

Fixing parameters will fix them to the value they are right now. If you change the parameters value, the param will be fixed to the new value!

If you want to operate on all parameters use m[‘’] to wildcard select all paramters and concatenate them. Printing m[‘’] will result in printing of all parameters in detail.

randomize(rand_gen=None, *args, **kwargs)

Randomize the model. Make this draw from the prior if one exists, else draw from given random generator

Parameters:
  • rand_gen – np random number generator which takes args and kwargs
  • loc (flaot) – loc parameter for random number generator
  • scale (float) – scale parameter for random number generator
  • kwargs (args,) – will be passed through to random number generator
GPy.core.parameterization.priorizable module
class Priorizable(name, default_prior=None, *a, **kw)[source]

Bases: paramz.core.parameter_core.Parameterizable

log_prior()[source]

evaluate the prior

set_prior(prior, warning=True)[source]

Set the prior for this object to prior. :param Prior prior: a prior to set for this parameter :param bool warning: whether to warn if another prior was set for this parameter

unset_priors(*priors)[source]

Un-set all priors given (in *priors) from this parameter handle.

GPy.core.parameterization.priors module
class DGPLVM(sigma2, lbl, x_shape)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the Discriminative Gaussian Process Latent Variable model paper, by Raquel.

Parameters:sigma2 – constant

Note

DGPLVM for Classification paper implementation

compute_Mi(cls)[source]
compute_Sb(cls, M_i, M_0)[source]
compute_Sw(cls, M_i)[source]
compute_cls(x)[source]
compute_indices(x)[source]
compute_listIndices(data_idx)[source]
compute_sig_alpha_W(data_idx, lst_idx_all, W_i)[source]
compute_sig_beta_Bi(data_idx, M_i, M_0, lst_idx_all)[source]
compute_wj(data_idx, M_i)[source]
get_class_label(y)[source]
lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
domain = 'real'
class DGPLVM_KFDA(lambdaa, sigma2, lbl, kern, x_shape)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the Discriminative Gaussian Process Latent Variable function using Kernel Fisher Discriminant Analysis by Seung-Jean Kim for implementing Face paper by Chaochao Lu.

Parameters:
  • lambdaa – constant
  • sigma2 – constant

Note

Surpassing Human-Level Face paper dgplvm implementation

A description for init

compute_A(lst_ni)[source]
compute_a(lst_ni)[source]
compute_cls(x)[source]
compute_lst_ni()[source]
get_class_label(y)[source]
lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
x_reduced(cls)[source]
domain = 'real'
class DGPLVM_Lamda(sigma2, lbl, x_shape, lamda, name='DP_prior')[source]

Bases: GPy.core.parameterization.priors.Prior, GPy.core.parameterization.parameterized.Parameterized

Implementation of the Discriminative Gaussian Process Latent Variable model paper, by Raquel.

Parameters:sigma2 – constant

Note

DGPLVM for Classification paper implementation

compute_Mi(cls)[source]
compute_Sb(cls, M_i, M_0)[source]
compute_Sw(cls, M_i)[source]
compute_cls(x)[source]
compute_indices(x)[source]
compute_listIndices(data_idx)[source]
compute_sig_alpha_W(data_idx, lst_idx_all, W_i)[source]
compute_sig_beta_Bi(data_idx, M_i, M_0, lst_idx_all)[source]
compute_wj(data_idx, M_i)[source]
get_class_label(y)[source]
lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
domain = 'real'
class DGPLVM_T(sigma2, lbl, x_shape, vec)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the Discriminative Gaussian Process Latent Variable model paper, by Raquel.

Parameters:sigma2 – constant

Note

DGPLVM for Classification paper implementation

compute_Mi(cls)[source]
compute_Sb(cls, M_i, M_0)[source]
compute_Sw(cls, M_i)[source]
compute_cls(x)[source]
compute_indices(x)[source]
compute_listIndices(data_idx)[source]
compute_sig_alpha_W(data_idx, lst_idx_all, W_i)[source]
compute_sig_beta_Bi(data_idx, M_i, M_0, lst_idx_all)[source]
compute_wj(data_idx, M_i)[source]
get_class_label(y)[source]
lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
domain = 'real'
class Exponential(l)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the Exponential probability function, coupled with random variables.

Parameters:l – shape parameter
lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
summary()[source]
domain = 'positive'
class Gamma(a, b)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the Gamma probability function, coupled with random variables.

Parameters:
  • a – shape parameter
  • b – rate parameter (warning: it’s the inverse of the scale)

Note

Bishop 2006 notation is used throughout the code

static from_EV(E, V)[source]

Creates an instance of a Gamma Prior by specifying the Expected value(s) and Variance(s) of the distribution.

Parameters:
  • E – expected value
  • V – variance
lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
summary()[source]
a
b
domain = 'positive'
class Gaussian(mu, sigma)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the univariate Gaussian probability function, coupled with random variables.

Parameters:
  • mu – mean
  • sigma – standard deviation

Note

Bishop 2006 notation is used throughout the code

lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
domain = 'real'
class HalfT(A, nu)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the half student t probability function, coupled with random variables.

Parameters:
  • A – scale parameter
  • nu – degrees of freedom
lnpdf(theta)[source]
lnpdf_grad(theta)[source]
rvs(n)[source]
domain = 'positive'
class InverseGamma(a, b)[source]

Bases: GPy.core.parameterization.priors.Gamma

Implementation of the inverse-Gamma probability function, coupled with random variables.

Parameters:
  • a – shape parameter
  • b – rate parameter (warning: it’s the inverse of the scale)

Note

Bishop 2006 notation is used throughout the code

static from_EV(E, V)[source]

Creates an instance of a Gamma Prior by specifying the Expected value(s) and Variance(s) of the distribution.

Parameters:
  • E – expected value
  • V – variance
lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
summary()[source]
domain = 'positive'
class LogGaussian(mu, sigma)[source]

Bases: GPy.core.parameterization.priors.Gaussian

Implementation of the univariate log-Gaussian probability function, coupled with random variables.

Parameters:
  • mu – mean
  • sigma – standard deviation

Note

Bishop 2006 notation is used throughout the code

lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
domain = 'positive'
class MultivariateGaussian(mu, var)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the multivariate Gaussian probability function, coupled with random variables.

Parameters:
  • mu – mean (N-dimensional array)
  • var – covariance matrix (NxN)

Note

Bishop 2006 notation is used throughout the code

lnpdf(x)[source]
lnpdf_grad(x)[source]
pdf(x)[source]
plot()[source]
rvs(n)[source]
summary()[source]
domain = 'real'
class Prior[source]

Bases: object

pdf(x)[source]
plot()[source]
domain = None
class StudentT(mu, sigma, nu)[source]

Bases: GPy.core.parameterization.priors.Prior

Implementation of the student t probability function, coupled with random variables.

Parameters:
  • mu – mean
  • sigma – standard deviation
  • nu – degrees of freedom

Note

Bishop 2006 notation is used throughout the code

lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
domain = 'real'
class Uniform(lower, upper)[source]

Bases: GPy.core.parameterization.priors.Prior

lnpdf(x)[source]
lnpdf_grad(x)[source]
rvs(n)[source]
gamma_from_EV(E, V)[source]
GPy.core.parameterization.transformations module
GPy.core.parameterization.variational module

Created on 6 Nov 2013

@author: maxz

class NormalPosterior(means=None, variances=None, name='latent space', *a, **kw)[source]

Bases: GPy.core.parameterization.variational.VariationalPosterior

NormalPosterior distribution for variational approximations.

holds the means and variances for a factorizing multivariate normal distribution

KL(other)[source]

Compute the KL divergence to another NormalPosterior Object. This only holds, if the two NormalPosterior objects have the same shape, as we do computational tricks for the multivariate normal KL divergence.

plot(*args, **kwargs)[source]

Plot latent space X in 1D:

See GPy.plotting.matplot_dep.variational_plots

class NormalPrior(name='normal_prior', **kw)[source]

Bases: GPy.core.parameterization.variational.VariationalPrior

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

class SpikeAndSlabPosterior(means, variances, binary_prob, group_spike=False, sharedX=False, name='latent space')[source]

Bases: GPy.core.parameterization.variational.VariationalPosterior

The SpikeAndSlab distribution for variational approximations.

binary_prob : the probability of the distribution on the slab part.

collate_gradient()[source]
plot(*args, **kwargs)[source]

Plot latent space X in 1D:

See GPy.plotting.matplot_dep.variational_plots

propogate_val()[source]
set_gradients(grad)[source]
class SpikeAndSlabPrior(pi=None, learnPi=False, variance=1.0, group_spike=False, name='SpikeAndSlabPrior', **kw)[source]

Bases: GPy.core.parameterization.variational.VariationalPrior

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

class VariationalPosterior(means=None, variances=None, name='latent space', *a, **kw)[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

has_uncertain_inputs()[source]
set_gradients(grad)[source]
class VariationalPrior(name='latent prior', **kw)[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

Submodules

GPy.core.gp module

class GP(X, Y, kernel, likelihood, mean_function=None, inference_method=None, name='gp', Y_metadata=None, normalizer=False)[source]

Bases: GPy.core.model.Model

General purpose Gaussian process model

Parameters:
  • X – input observations
  • Y – output observations
  • kernel – a GPy kernel
  • likelihood – a GPy likelihood
  • inference_method – The LatentFunctionInference inference method to use for this GP
  • normalizer (Norm) – normalize the outputs Y. Prediction will be un-normalized using this normalizer. If normalizer is True, we will normalize using Standardize. If normalizer is False, no normalization will be done.
Return type:

model object

Note

Multiple independent outputs are allowed using columns of Y

get_most_significant_input_dimensions(which_indices=None)[source]
infer_newX(Y_new, optimize=True)[source]

Infer X for the new observed data Y_new.

Parameters:
  • Y_new (numpy.ndarray) – the new observed data for inference
  • optimize (boolean) – whether to optimize the location of new X (True by default)
Returns:

a tuple containing the posterior estimation of X and the model that optimize X

Return type:

(VariationalPosterior and numpy.ndarray, Model)

input_sensitivity(summarize=True)[source]

Returns the sensitivity for each dimension of this model

log_likelihood()[source]

The log marginal likelihood of the model, \(p(\mathbf{y})\), this is the objective function of the model being optimised

log_predictive_density(x_test, y_test, Y_metadata=None)[source]

Calculation of the log predictive density

Parameters:
  • x_test ((Nx1) array) – test locations (x_{*})
  • y_test ((Nx1) array) – test observations (y_{*})
  • Y_metadata – metadata associated with the test points
log_predictive_density_sampling(x_test, y_test, Y_metadata=None, num_samples=1000)[source]

Calculation of the log predictive density by sampling

Parameters:
  • x_test ((Nx1) array) – test locations (x_{*})
  • y_test ((Nx1) array) – test observations (y_{*})
  • Y_metadata – metadata associated with the test points
  • num_samples (int) – number of samples to use in monte carlo integration
optimize(optimizer=None, start=None, messages=False, max_iters=1000, ipython_notebook=True, clear_after_finish=False, **kwargs)[source]

Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors. kwargs are passed to the optimizer. They can be:

Parameters:
  • max_iters (int) – maximum number of function evaluations
  • messages (bool) – whether to display during optimisation
  • optimizer (string) – which optimizer to use (defaults to self.preferred optimizer), a range of optimisers can be found in :module:`~GPy.inference.optimization`, they include ‘scg’, ‘lbfgs’, ‘tnc’.
  • ipython_notebook (bool) – whether to use ipython notebook widgets or not.
  • clear_after_finish (bool) – if in ipython notebook, we can clear the widgets after optimization.
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)

Convenience function for plotting the fit of a GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

If you want fine graned control use the specific plotting functions supplied in the model.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • projection ({2d|3d}) – plot in 2d or 3d?
  • legend (bool) – convenience, whether to put a legend on the plot or not.
plot_confidence(lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_data(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)
Plot the training data
  • For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_data_error(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)

Plot the training data input error.

For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_density(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_errorbars_trainset(which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)

Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.

This also works for heteroscedastic likelihoods.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols – when the data has several columns (independant outputs), only plot these
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_f(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_latent(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_magnification(labels=None, which_indices=None, resolution=60, marker='<>^vsd', legend=True, plot_limits=None, updates=False, mean=True, covariance=True, kern=None, num_samples=1000, scatter_kwargs=None, plot_scatter=True, **imshow_kwargs)

Plot the magnification factor of the GP on the inputs. This is the density of the GP as a gray scale.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • marker (str) – markers to use - cycle if more labels then markers are given
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • mean (bool) – use the mean of the Wishart embedding for the magnification factor
  • covariance (bool) – use the covariance of the Wishart embedding for the magnification factor
  • kern (Kern) – the kernel to use for prediction
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • kwargs – the kwargs for the scatter plots
plot_mean(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – for 2D plotting, the number of contour levels to use is
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_noiseless(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_samples(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • levels (int) – for 2D plotting, the number of contour levels to use is
posterior_covariance_between_points(X1, X2, Y_metadata=None, likelihood=None, include_likelihood=True)[source]

Computes the posterior covariance between points. Includes likelihood variance as well as normalization so that evaluation at (x,x) is consistent with model.predict

Parameters:
  • X1 – some input observations
  • X2 – other input observations
  • Y_metadata – metadata about the predicting point to pass to the likelihood
  • include_likelihood (bool) – Whether or not to add likelihood noise to the predicted underlying latent function f.
Returns:

cov: posterior covariance, a Numpy array, Nnew x Nnew if self.output_dim == 1, and Nnew x Nnew x self.output_dim otherwise.

posterior_samples(X, size=10, Y_metadata=None, likelihood=None, **predict_kwargs)[source]

Samples the posterior GP at the points X.

Parameters:
  • X (np.ndarray (Nnew x self.input_dim.)) – the points at which to take the samples.
  • size (int.) – the number of a posteriori samples.
  • noise_model (integer.) – for mixed noise likelihood, the noise model to use in the samples.
Returns:

Ysim: set of simulations,

Return type:

np.ndarray (D x N x samples) (if D==1 we flatten out the first dimension)

posterior_samples_f(X, size=10, **predict_kwargs)[source]

Samples the posterior GP at the points X.

Parameters:
  • X (np.ndarray (Nnew x self.input_dim)) – The points at which to take the samples.
  • size (int.) – the number of a posteriori samples.
Returns:

set of simulations

Return type:

np.ndarray (Nnew x D x samples)

predict(Xnew, full_cov=False, Y_metadata=None, kern=None, likelihood=None, include_likelihood=True)[source]

Predict the function(s) at the new point(s) Xnew. This includes the likelihood variance added to the predicted underlying function (usually referred to as f).

In order to predict without adding in the likelihood give include_likelihood=False, or refer to self.predict_noiseless().

Parameters:
  • Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
  • full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
  • Y_metadata – metadata about the predicting point to pass to the likelihood
  • kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
  • include_likelihood (bool) – Whether or not to add likelihood noise to the predicted underlying latent function f.
Returns:

(mean, var): mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False,

Nnew x Nnew otherwise

If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.

Note: If you want the predictive quantiles (e.g. 95% confidence interval) use predict_quantiles.

predict_jacobian(Xnew, kern=None, full_cov=False)[source]

Compute the derivatives of the posterior of the GP.

Given a set of points at which to predict X* (size [N*,Q]), compute the mean and variance of the derivative. Resulting arrays are sized:

dL_dX* – [N*, Q ,D], where D is the number of output in this GP (usually one).
Note that this is the mean and variance of the derivative, not the derivative of the mean and variance! (See predictive_gradients for that)
dv_dX* – [N*, Q], (since all outputs have the same variance)
If there is missing data, it is not implemented for now, but there will be one output variance per output dimension.
Parameters:
  • X (np.ndarray (Xnew x self.input_dim)) – The points at which to get the predictive gradients.
  • kern – The kernel to compute the jacobian for.
  • full_cov (boolean) – whether to return the cross-covariance terms between

the N* Jacobian vectors

Returns:dmu_dX, dv_dX
Return type:[np.ndarray (N*, Q ,D), np.ndarray (N*,Q,(D)) ]
predict_magnification(Xnew, kern=None, mean=True, covariance=True, dimensions=None)[source]

Predict the magnification factor as

sqrt(det(G))

for each point N in Xnew.

Parameters:
  • mean (bool) – whether to include the mean of the wishart embedding.
  • covariance (bool) – whether to include the covariance of the wishart embedding.
  • dimensions (array-like) – which dimensions of the input space to use [defaults to self.get_most_significant_input_dimensions()[:2]]
predict_noiseless(Xnew, full_cov=False, Y_metadata=None, kern=None)[source]

Convenience function to predict the underlying function of the GP (often referred to as f) without adding the likelihood variance on the prediction function.

This is most likely what you want to use for your predictions.

Parameters:
  • Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
  • full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
  • Y_metadata – metadata about the predicting point to pass to the likelihood
  • kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
Returns:

(mean, var):

mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False, Nnew x Nnew otherwise

If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.

Note: If you want the predictive quantiles (e.g. 95% confidence interval) use predict_quantiles.

predict_quantiles(X, quantiles=(2.5, 97.5), Y_metadata=None, kern=None, likelihood=None)[source]

Get the predictive quantiles around the prediction at X

Parameters:
  • X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
  • quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
  • kern – optional kernel to use for prediction
Returns:

list of quantiles for each X and predictive quantiles for interval combination

Return type:

[np.ndarray (Xnew x self.output_dim), np.ndarray (Xnew x self.output_dim)]

predict_wishard_embedding(Xnew, kern=None, mean=True, covariance=True)[source]
predict_wishart_embedding(Xnew, kern=None, mean=True, covariance=True)[source]

Predict the wishart embedding G of the GP. This is the density of the input of the GP defined by the probabilistic function mapping f. G = J_mean.T*J_mean + output_dim*J_cov.

Parameters:Xnew (array-like) – The points at which to evaluate the magnification.

:param Kern kern: The kernel to use for the magnification.

Supplying only a part of the learning kernel gives insights into the density of the specific kernel part of the input function. E.g. one can see how dense the linear part of a kernel is compared to the non-linear part etc.

predictive_gradients(Xnew, kern=None)[source]

Compute the derivatives of the predicted latent function with respect to X*

Given a set of points at which to predict X* (size [N*,Q]), compute the derivatives of the mean and variance. Resulting arrays are sized:

dmu_dX* – [N*, Q ,D], where D is the number of output in this GP (usually one).

Note that this is not the same as computing the mean and variance of the derivative of the function!

dv_dX* – [N*, Q], (since all outputs have the same variance)
Parameters:X (np.ndarray (Xnew x self.input_dim)) – The points at which to get the predictive gradients
Returns:dmu_dX, dv_dX
Return type:[np.ndarray (N*, Q ,D), np.ndarray (N*,Q) ]
save_model(output_filename, compress=True, save_data=True)[source]
set_X(X)[source]

Set the input data of the model

Parameters:X (np.ndarray) – input observations
set_XY(X=None, Y=None)[source]

Set the input / output data of the model This is useful if we wish to change our existing data but maintain the same model

Parameters:
  • X (np.ndarray) – input observations
  • Y (np.ndarray) – output observations
set_Y(Y)[source]

Set the output data of the model

Parameters:X (np.ndarray) – output observations
to_dict(save_data=True)[source]

Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent.

Parameters:save_data (boolean) – if true, it adds the training data self.X and self.Y to the dictionary
Return dict:json serializable dictionary containing the needed information to instantiate the object
input_dim
num_data

GPy.core.gp_grid module

class GpGrid(X, Y, kernel, likelihood, inference_method=None, name='gp grid', Y_metadata=None, normalizer=False)[source]

Bases: GPy.core.gp.GP

A GP model for Grid inputs

Parameters:
  • X (np.ndarray (num_data x input_dim)) – inputs
  • likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
  • kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
kron_mmprod(A, B)[source]
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method reperforms inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.core.mapping module

class Bijective_mapping(input_dim, output_dim, name='bijective_mapping')[source]

Bases: GPy.core.mapping.Mapping

This is a mapping that is bijective, i.e. you can go from X to f and also back from f to X. The inverse mapping is called g().

g(f)[source]

Inverse mapping from output domain of the function to the inputs.

class Mapping(input_dim, output_dim, name='mapping')[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

Base model for shared mapping behaviours

f(X)[source]
static from_dict(input_dict)[source]

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:input_dict (dict) – Dictionary with all the information needed to instantiate the object.
gradients_X(dL_dF, X)[source]
to_dict()[source]
update_gradients(dL_dF, X)[source]

GPy.core.model module

class Model(name)[source]

Bases: paramz.model.Model, GPy.core.parameterization.priorizable.Priorizable

static from_dict(input_dict, data=None)[source]

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:input_dict (dict) – Dictionary with all the information needed to instantiate the object.
static load_model(output_filename, data=None)[source]
log_likelihood()[source]
objective_function()[source]

The objective function for the given algorithm.

This function is the true objective, which wants to be minimized. Note that all parameters are already set and in place, so you just need to return the objective function here.

For probabilistic models this is the negative log_likelihood (including the MAP prior), so we return it here. If your model is not probabilistic, just return your objective to minimize here!

objective_function_gradients()[source]

The gradients for the objective function for the given algorithm. The gradients are w.r.t. the negative objective function, as this framework works with negative log-likelihoods as a default.

You can find the gradient for the parameters in self.gradient at all times. This is the place, where gradients get stored for parameters.

This function is the true objective, which wants to be minimized. Note that all parameters are already set and in place, so you just need to return the gradient here.

For probabilistic models this is the gradient of the negative log_likelihood (including the MAP prior), so we return it here. If your model is not probabilistic, just return your negative gradient here!

randomize(rand_gen=None, *args, **kwargs)

Randomize the model. Make this draw from the prior if one exists, else draw from given random generator

Parameters:
  • rand_gen – np random number generator which takes args and kwargs
  • loc (flaot) – loc parameter for random number generator
  • scale (float) – scale parameter for random number generator
  • kwargs (args,) – will be passed through to random number generator
save_model(output_filename, compress=True, save_data=True)[source]
to_dict()[source]

GPy.core.sparse_gp module

class SparseGP(X, Y, Z, kernel, likelihood, mean_function=None, X_variance=None, inference_method=None, name='sparse gp', Y_metadata=None, normalizer=False)[source]

Bases: GPy.core.gp.GP

A general purpose Sparse GP model

This model allows (approximate) inference using variational DTC or FITC (Gaussian likelihoods) as well as non-conjugate sparse methods based on these.

This is not for missing data, as the implementation for missing data involves some inefficient optimization routine decisions. See missing data SparseGP implementation in py:class:’~GPy.models.sparse_gp_minibatch.SparseGPMiniBatch’.

Parameters:
  • X (np.ndarray (num_data x input_dim)) – inputs
  • likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
  • kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
  • X_variance (np.ndarray (num_data x input_dim) | None) – The uncertainty in the measurements of X (Gaussian variance)
  • Z (np.ndarray (num_inducing x input_dim)) – inducing inputs
  • num_inducing (int) – Number of inducing points (optional, default 10. Ignored if Z is not None)
has_uncertain_inputs()[source]
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot_inducing(visible_dims=None, projection='2d', label='inducing', legend=True, **plot_kwargs)

Plot the inducing inputs of a sparse gp model

Parameters:
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • plot_kwargs (kwargs) – keyword arguments for the plotting library
set_Z(Z, trigger_update=True)[source]
to_dict(save_data=True)[source]

Convert the object into a json serializable dictionary.

Parameters:save_data (boolean) – if true, it adds the training data self.X and self.Y to the dictionary
Return dict:json serializable dictionary containing the needed information to instantiate the object

GPy.core.sparse_gp_mpi module

class SparseGP_MPI(X, Y, Z, kernel, likelihood, variational_prior=None, mean_function=None, inference_method=None, name='sparse gp', Y_metadata=None, mpi_comm=None, normalizer=False)[source]

Bases: GPy.core.sparse_gp.SparseGP

A general purpose Sparse GP model with MPI parallelization support

This model allows (approximate) inference using variational DTC or FITC (Gaussian likelihoods) as well as non-conjugate sparse methods based on these.

Parameters:
  • X (np.ndarray (num_data x input_dim)) – inputs
  • likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
  • kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
  • X_variance (np.ndarray (num_data x input_dim) | None) – The uncertainty in the measurements of X (Gaussian variance)
  • Z (np.ndarray (num_inducing x input_dim)) – inducing inputs
  • num_inducing (int) – Number of inducing points (optional, default 10. Ignored if Z is not None)
  • mpi_comm (mpi4py.MPI.Intracomm) – The communication group of MPI, e.g. mpi4py.MPI.COMM_WORLD
optimize(optimizer=None, start=None, **kwargs)[source]

Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors. kwargs are passed to the optimizer. They can be:

Parameters:
  • max_iters (int) – maximum number of function evaluations
  • messages (bool) – whether to display during optimisation
  • optimizer (string) – which optimizer to use (defaults to self.preferred optimizer), a range of optimisers can be found in :module:`~GPy.inference.optimization`, they include ‘scg’, ‘lbfgs’, ‘tnc’.
  • ipython_notebook (bool) – whether to use ipython notebook widgets or not.
  • clear_after_finish (bool) – if in ipython notebook, we can clear the widgets after optimization.
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

optimizer_array

Array for the optimizer to work on. This array always lives in the space for the optimizer. Thus, it is untransformed, going from Transformations.

Setting this array, will make sure the transformed parameters for this model will be set accordingly. It has to be set with an array, retrieved from this method, as e.g. fixing will resize the array.

The optimizer should only interfere with this array, such that transformations are secured.

GPy.core.svgp module

class SVGP(X, Y, Z, kernel, likelihood, mean_function=None, name='SVGP', Y_metadata=None, batchsize=None, num_latent_functions=None)[source]

Bases: GPy.core.sparse_gp.SparseGP

Stochastic Variational GP.

For Gaussian Likelihoods, this implements

Gaussian Processes for Big data, Hensman, Fusi and Lawrence, UAI 2013,

But without natural gradients. We’ll use the lower-triangluar representation of the covariance matrix to ensure positive-definiteness.

For Non Gaussian Likelihoods, this implements

Hensman, Matthews and Ghahramani, Scalable Variational GP Classification, ArXiv 1411.2005

new_batch()[source]

Return a new batch of X and Y by taking a chunk of data from the complete X and Y

optimizeWithFreezingZ()[source]
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

set_data(X, Y)[source]

Set the data without calling parameters_changed to avoid wasted computation If this is called by the stochastic_grad function this will immediately update the gradients

stochastic_grad(parameters)[source]

GPy.core.symbolic module

GPy.models package

Introduction

This package principally contains classes ultimately inherited from GPy.core.gp.GP intended as models for end user consuption - much of GPy.core.gp.GP is not intended to be called directly. The general form of a “model” is a function that takes some data, a kernel (see GPy.kern) and other parameters, returning an object representation.

Several models directly inherit GPy.core.gp.GP:

Inheritance diagram of GPy.models.gp_classification, GPy.models.gp_coregionalized_regression, GPy.models.gp_heteroscedastic_regression, GPy.models.gp_offset_regression, GPy.models.gp_regression, GPy.models.gp_var_gauss, GPy.models.gplvm, GPy.models.input_warped_gp, GPy.models.multioutput_gp

Some models fall into conceptually related groups of models (e.g. GPy.core.sparse_gp, GPy.core.sparse_gp_mpi):

Inheritance diagram of GPy.models.bayesian_gplvm, GPy.models.bayesian_gplvm_minibatch, GPy.models.gp_multiout_regression, GPy.models.gp_multiout_regression_md, GPy.models.ibp_lfm.IBPLFM, GPy.models.sparse_gp_coregionalized_regression, GPy.models.sparse_gp_minibatch, GPy.models.sparse_gp_regression, GPy.models.sparse_gp_regression_md, GPy.models.sparse_gplvm

In some cases one end-user model inherits another e.g.

Inheritance diagram of GPy.models.bayesian_gplvm_minibatch

Submodules

GPy.models.bayesian_gplvm module

class BayesianGPLVM(Y, input_dim, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', mpi_comm=None, normalizer=None, missing_data=False, stochastic=False, batchsize=1, Y_metadata=None)[source]

Bases: GPy.core.sparse_gp_mpi.SparseGP_MPI

Bayesian Gaussian Process Latent Variable Model

Parameters:
  • Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
  • input_dim (int) – latent dimensionality
  • init ('PCA'|'random') – initialisation method for the latent space
get_X_gradients(X)[source]

Get the gradients of the posterior distribution of X in its specific form.

parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters:
  • which_indices ([int]) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – marker to use [default is custom arrow like]
  • kwargs – the kwargs for the scatter plots
  • projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • scatter_kwargs – the kwargs for the scatter plots
plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – markers to use - cycle if more labels then markers are given
  • kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • annotation_kwargs – the kwargs for the annotation plot
  • scatter_kwargs – the kwargs for the scatter plots
set_X_gradients(X, X_grad)[source]

Set the gradients of the posterior distribution of X in its specific form.

GPy.models.bayesian_gplvm_minibatch module

class BayesianGPLVMMiniBatch(Y, input_dim, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', normalizer=None, missing_data=False, stochastic=False, batchsize=1)[source]

Bases: GPy.models.sparse_gp_minibatch.SparseGPMiniBatch

Bayesian Gaussian Process Latent Variable Model

Parameters:
  • Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
  • input_dim (int) – latent dimensionality
  • init ('PCA'|'random') – initialisation method for the latent space
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters:
  • which_indices ([int]) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – marker to use [default is custom arrow like]
  • kwargs – the kwargs for the scatter plots
  • projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • scatter_kwargs – the kwargs for the scatter plots
plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – markers to use - cycle if more labels then markers are given
  • kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • annotation_kwargs – the kwargs for the annotation plot
  • scatter_kwargs – the kwargs for the scatter plots

GPy.models.bcgplvm module

class BCGPLVM(Y, input_dim, kernel=None, mapping=None)[source]

Bases: GPy.models.gplvm.GPLVM

Back constrained Gaussian Process Latent Variable Model

Parameters:
  • Y (np.ndarray) – observed data
  • input_dim (int) – latent dimensionality
  • mapping (GPy.core.Mapping object) – mapping for back constraint
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.dpgplvm module

class DPBayesianGPLVM(Y, input_dim, X_prior, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', mpi_comm=None, normalizer=None, missing_data=False, stochastic=False, batchsize=1)[source]

Bases: GPy.models.bayesian_gplvm.BayesianGPLVM

Bayesian Gaussian Process Latent Variable Model with Descriminative prior

GPy.models.gp_classification module

class GPClassification(X, Y, kernel=None, Y_metadata=None, mean_function=None, inference_method=None, likelihood=None, normalizer=False)[source]

Bases: GPy.core.gp.GP

Gaussian Process classification

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters:
  • X – input observations
  • Y – observed values, can be None if likelihood is not None
  • kernel – a GPy kernel, defaults to rbf
  • likelihood – a GPy likelihood, defaults to Bernoulli
  • inference_method (GPy.inference.latent_function_inference.LatentFunctionInference) – Latent function inference to use, defaults to EP

Note

Multiple independent outputs are allowed using columns of Y

static from_dict(input_dict, data=None)[source]

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:input_dict (dict) – Dictionary with all the information needed to instantiate the object.
static from_gp(gp)[source]
save_model(output_filename, compress=True, save_data=True)[source]
to_dict(save_data=True)[source]

Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent.

Parameters:save_data (boolean) – if true, it adds the training data self.X and self.Y to the dictionary
Return dict:json serializable dictionary containing the needed information to instantiate the object

GPy.models.gp_coregionalized_regression module

class GPCoregionalizedRegression(X_list, Y_list, kernel=None, likelihoods_list=None, name='GPCR', W_rank=1, kernel_name='coreg')[source]

Bases: GPy.core.gp.GP

Gaussian Process model for heteroscedastic multioutput regression

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters:
  • X_list (list of numpy arrays) – list of input observations corresponding to each output
  • Y_list (list of numpy arrays) – list of observed values related to the different noise models
  • kernel (None | GPy.kernel defaults) – a GPy kernel ** Coregionalized, defaults to RBF ** Coregionalized
  • name (string) – model name
  • W_rank (integer) – number tuples of the corregionalization parameters ‘W’ (see coregionalize kernel documentation)
  • kernel_name (string) – name of the kernel
Likelihoods_list:
 

a list of likelihoods, defaults to list of Gaussian likelihoods

GPy.models.gp_grid_regression module

class GPRegressionGrid(X, Y, kernel=None, Y_metadata=None, normalizer=None)[source]

Bases: GPy.core.gp_grid.GpGrid

Gaussian Process model for grid inputs using Kronecker products

This is a thin wrapper around the models.GpGrid class, with a set of sensible defaults

Parameters:
  • X – input observations
  • Y – observed values
  • kernel – a GPy kernel, defaults to the kron variation of SqExp
  • normalizer (Norm) –

    [False]

    Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)

Note

Multiple independent outputs are allowed using columns of Y

GPy.models.gp_heteroscedastic_regression module

class GPHeteroscedasticRegression(X, Y, kernel=None, Y_metadata=None)[source]

Bases: GPy.core.gp.GP

Gaussian Process model for heteroscedastic regression

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters:
  • X – input observations
  • Y – observed values
  • kernel – a GPy kernel, defaults to rbf

NB: This model does not make inference on the noise outside the training set

GPy.models.gp_kronecker_gaussian_regression module

class GPKroneckerGaussianRegression(X1, X2, Y, kern1, kern2, noise_var=1.0, name='KGPR')[source]

Bases: GPy.core.model.Model

Kronecker GP regression

Take two kernels computed on separate spaces K1(X1), K2(X2), and a data matrix Y which is f size (N1, N2).

The effective covaraince is np.kron(K2, K1) The effective data is vec(Y) = Y.flatten(order=’F’)

The noise must be iid Gaussian.

See [stegle_et_al_2011].

References

[stegle_et_al_2011]Stegle, O.; Lippert, C.; Mooij, J.M.; Lawrence, N.D.; Borgwardt, K.:Efficient inference in matrix-variate Gaussian models with iid observation noise. In: Advances in Neural Information Processing Systems, 2011, Pages 630-638
log_likelihood()[source]
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

predict(X1new, X2new)[source]

Return the predictive mean and variance at a series of new points X1new, X2new Only returns the diagonal of the predictive variance, for now.

Parameters:
  • X1new (np.ndarray, Nnew x self.input_dim1) – The points at which to make a prediction
  • X2new (np.ndarray, Nnew x self.input_dim2) – The points at which to make a prediction

GPy.models.gp_multiout_regression module

class GPMultioutRegression(X, Y, Xr_dim, kernel=None, kernel_row=None, Z=None, Z_row=None, X_row=None, Xvariance_row=None, num_inducing=(10, 10), qU_var_r_W_dim=None, qU_var_c_W_dim=None, init='GP', name='GPMR')[source]

Bases: GPy.core.sparse_gp.SparseGP

Gaussian Process model for multi-output regression without missing data

This is an implementation of Latent Variable Multiple Output Gaussian Processes (LVMOGP) in [Dai_et_al_2017].

References

[Dai_et_al_2017]Dai, Z.; Alvarez, M.A.; Lawrence, N.D: Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes. In NIPS, 2017.
Parameters:
  • X (numpy.ndarray) – input observations.
  • Y (numpy.ndarray) – output observations, each column corresponding to an output dimension.
  • Xr_dim (int) – the dimensionality of a latent space, in which output dimensions are embedded in
  • kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF **
  • kernel_row (GPy.kern.Kern or None) – a GPy kernel for the GP of the latent space ** defaults to RBF **
  • Z (numpy.ndarray or None) – inducing inputs
  • Z_row (numpy.ndarray or None) – inducing inputs for the latent space
  • X_row (numpy.ndarray or None) – the initial value of the mean of the variational posterior distribution of points in the latent space
  • Xvariance_row (numpy.ndarray or None) – the initial value of the variance of the variational posterior distribution of points in the latent space
  • num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space.
  • qU_var_r_W_dim (int) – the dimensionality of the covariance of q(U) for the latent space. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
  • qU_var_c_W_dim (int) – the dimensionality of the covariance of q(U) for the GP regression. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
  • init (str) – the choice of initialization: ‘GP’ or ‘rand’. With ‘rand’, the model is initialized randomly. With ‘GP’, the model is initialized through a protocol as follows: (1) fits a sparse GP (2) fits a BGPLVM based on the outcome of sparse GP (3) initialize the model based on the outcome of the BGPLVM.
  • name (str) – the name of the model
optimize_auto(max_iters=10000, verbose=True)[source]

Optimize the model parameters through a pre-defined protocol.

Parameters:
  • max_iters (int) – the maximum number of iterations.
  • verbose (boolean) – print the progress of optimization or not.
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.gp_multiout_regression_md module

class GPMultioutRegressionMD(X, Y, indexD, Xr_dim, kernel=None, kernel_row=None, Z=None, Z_row=None, X_row=None, Xvariance_row=None, num_inducing=(10, 10), qU_var_r_W_dim=None, qU_var_c_W_dim=None, init='GP', heter_noise=False, name='GPMRMD')[source]

Bases: GPy.core.sparse_gp.SparseGP

Gaussian Process model for multi-output regression with missing data

This is an implementation of Latent Variable Multiple Output Gaussian Processes (LVMOGP) in [Dai_et_al_2017]. This model targets at the use case, in which each output dimension is observed at a different set of inputs. The model takes a different data format: the inputs and outputs observations of all the output dimensions are stacked together correspondingly into two matrices. An extra array is used to indicate the index of output dimension for each data point. The output dimensions are indexed using integers from 0 to D-1 assuming there are D output dimensions.

References

[Dai_et_al_2017]Dai, Z.; Alvarez, M.A.; Lawrence, N.D: Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes. In NIPS, 2017.
Parameters:
  • X (numpy.ndarray) – input observations.
  • Y (numpy.ndarray) – output observations, each column corresponding to an output dimension.
  • indexD (numpy.ndarray) – the array containing the index of output dimension for each data point
  • Xr_dim (int) – the dimensionality of a latent space, in which output dimensions are embedded in
  • kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF **
  • kernel_row (GPy.kern.Kern or None) – a GPy kernel for the GP of the latent space ** defaults to RBF **
  • Z (numpy.ndarray or None) – inducing inputs
  • Z_row (numpy.ndarray or None) – inducing inputs for the latent space
  • X_row (numpy.ndarray or None) – the initial value of the mean of the variational posterior distribution of points in the latent space
  • Xvariance_row (numpy.ndarray or None) – the initial value of the variance of the variational posterior distribution of points in the latent space
  • num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space.
  • qU_var_r_W_dim (int) – the dimensionality of the covariance of q(U) for the latent space. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
  • qU_var_c_W_dim (int) – the dimensionality of the covariance of q(U) for the GP regression. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix.
  • init (str) – the choice of initialization: ‘GP’ or ‘rand’. With ‘rand’, the model is initialized randomly. With ‘GP’, the model is initialized through a protocol as follows: (1) fits a sparse GP (2) fits a BGPLVM based on the outcome of sparse GP (3) initialize the model based on the outcome of the BGPLVM.
  • heter_noise (boolean) – whether assuming heteroscedastic noise in the model, boolean
  • name (str) – the name of the model
optimize_auto(max_iters=10000, verbose=True)[source]

Optimize the model parameters through a pre-defined protocol.

Parameters:
  • max_iters (int) – the maximum number of iterations.
  • verbose (boolean) – print the progress of optimization or not.
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.gp_offset_regression module

class GPOffsetRegression(X, Y, kernel=None, Y_metadata=None, normalizer=None, noise_var=1.0, mean_function=None)[source]

Bases: GPy.core.gp.GP

Gaussian Process model for offset regression

Parameters:
  • X – input observations, we assume for this class that this has one dimension of actual inputs and the last dimension should be the index of the cluster (so X should be Nx2)
  • Y – observed values (Nx1?)
  • kernel – a GPy kernel, defaults to rbf
  • normalizer (Norm) – [False]
  • noise_var

    the noise variance for Gaussian likelhood, defaults to 1.

    Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)

Note

Multiple independent outputs are allowed using columns of Y

dr_doffset(X, sel, delta)[source]
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.gp_regression module

class GPRegression(X, Y, kernel=None, Y_metadata=None, normalizer=None, noise_var=1.0, mean_function=None)[source]

Bases: GPy.core.gp.GP

Gaussian Process model for regression

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters:
  • X – input observations
  • Y – observed values
  • kernel – a GPy kernel, defaults to rbf
  • normalizer (Norm) – [False]
  • noise_var

    the noise variance for Gaussian likelhood, defaults to 1.

    Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)

Note

Multiple independent outputs are allowed using columns of Y

static from_gp(gp)[source]
save_model(output_filename, compress=True, save_data=True)[source]
to_dict(save_data=True)[source]

Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent.

Parameters:save_data (boolean) – if true, it adds the training data self.X and self.Y to the dictionary
Return dict:json serializable dictionary containing the needed information to instantiate the object

GPy.models.gp_var_gauss module

class GPVariationalGaussianApproximation(X, Y, kernel, likelihood, Y_metadata=None)[source]

Bases: GPy.core.gp.GP

The Variational Gaussian Approximation revisited

References

[opper_archambeau_2009]Opper, M.; Archambeau, C.; The Variational Gaussian Approximation Revisited. Neural Comput. 2009, pages 786-792.

GPy.models.gplvm module

class GPLVM(Y, input_dim, init='PCA', X=None, kernel=None, name='gplvm', Y_metadata=None, normalizer=False)[source]

Bases: GPy.core.gp.GP

Gaussian Process Latent Variable Model

Parameters:
  • Y (np.ndarray) – observed data
  • input_dim (int) – latent dimensionality
  • init ('PCA'|'random') – initialisation method for the latent space
  • normalizer (bool) – normalize the outputs Y. If normalizer is True, we will normalize using Standardize. If normalizer is False (the default), no normalization will be done.
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters:
  • which_indices ([int]) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – marker to use [default is custom arrow like]
  • kwargs – the kwargs for the scatter plots
  • projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • scatter_kwargs – the kwargs for the scatter plots
plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – markers to use - cycle if more labels then markers are given
  • kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • annotation_kwargs – the kwargs for the annotation plot
  • scatter_kwargs – the kwargs for the scatter plots

GPy.models.gradient_checker module

class GradientChecker(f, df, x0, names=None, *args, **kwargs)[source]

Bases: GPy.core.model.Model

Parameters:
  • f – Function to check gradient for
  • df – Gradient of function to check
  • x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!!
  • names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected.
  • args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)

Examples

Initialisation:

from GPy.models import GradientChecker
N, M, Q = 10, 5, 3

Sinusoid:

X = numpy.random.rand(N, Q)
grad = GradientChecker(numpy.sin,numpy.cos,X,'x')
grad.checkgrad(verbose=1)

Using GPy:

X, Z = numpy.random.randn(N,Q), numpy.random.randn(M,Q)
kern = GPy.kern.linear(Q, ARD=True) + GPy.kern.rbf(Q, ARD=True)
grad = GradientChecker(kern.K,
                        lambda x: 2*kern.dK_dX(numpy.ones((1,1)), x),
                        x0 = X.copy(),
                        names='X')
grad.checkgrad(verbose=1)
grad.randomize()
grad.checkgrad(verbose=1)
log_likelihood()[source]
class HessianChecker(f, df, ddf, x0, names=None, *args, **kwargs)[source]

Bases: GPy.models.gradient_checker.GradientChecker

Parameters:
  • f – Function (only used for numerical hessian gradient)
  • df – Gradient of function to check
  • ddf – Analytical gradient function
  • x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!!
  • names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected.
  • args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)
checkgrad(target_param=None, verbose=False, step=1e-06, tolerance=0.001, block_indices=None, plot=False)[source]

Overwrite checkgrad method to check whole block instead of looping through

Shows diagnostics using matshow instead

Parameters:
  • verbose (bool) – If True, print a “full” checking of each parameter
  • step (float (default 1e-6)) – The size of the step around which to linearise the objective
  • tolerance (float (default 1e-3)) – the tolerance allowed (see note)
Note:-
The gradient is considered correct if the ratio of the analytical and numerical gradients is within <tolerance> of unity.
checkgrad_block(analytic_hess, numeric_hess, verbose=False, step=1e-06, tolerance=0.001, block_indices=None, plot=False)[source]

Checkgrad a block matrix

class SkewChecker(df, ddf, dddf, x0, names=None, *args, **kwargs)[source]

Bases: GPy.models.gradient_checker.HessianChecker

Parameters:
  • df – gradient of function
  • ddf – Gradient of function to check (hessian)
  • dddf – Analytical gradient function (third derivative)
  • x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!!
  • names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected.
  • args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)
checkgrad(target_param=None, verbose=False, step=1e-06, tolerance=0.001, block_indices=None, plot=False, super_plot=False)[source]

Gradient checker that just checks each hessian individually

super_plot will plot the hessian wrt every parameter, plot will just do the first one

at_least_one_element(x)[source]
flatten_if_needed(x)[source]
get_shape(x)[source]

GPy.models.ibp_lfm module

class IBPLFM(X, Y, input_dim=2, output_dim=1, rank=1, Gamma=None, num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='IBP for LFM', alpha=2.0, beta=2.0, connM=None, tau=None, mpi_comm=None, normalizer=False, variational_prior=None, **kwargs)[source]

Bases: GPy.core.sparse_gp_mpi.SparseGP_MPI

Indian Buffet Process for Latent Force Models

Parameters:
  • Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
  • X (np.ndarray) – input data (np.ndarray) [X:values, X:index], index refers to the number of the output
  • input_dim (int) – latent dimensionality

: param rank: number of latent functions

get_Zp_gradients(Zp)[source]

Get the gradients of the posterior distribution of Zp in its specific form.

parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

set_Zp_gradients(Zp, Zp_grad)[source]

Set the gradients of the posterior distribution of Zp in its specific form.

class IBPPosterior(binary_prob, tau=None, name='Sensitivity space', *a, **kw)[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

The IBP distribution for variational approximations.

binary_prob : the probability of including a latent function over an output.

set_gradients(grad)[source]
class IBPPrior(rank, alpha=2.0, name='IBPPrior', **kw)[source]

Bases: GPy.core.parameterization.variational.VariationalPrior

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

class VarDTC_minibatch_IBPLFM(batchsize=None, limit=3, mpi_comm=None)[source]

Bases: GPy.inference.latent_function_inference.var_dtc_parallel.VarDTC_minibatch

Modifications of VarDTC_minibatch for IBP LFM

gatherPsiStat(kern, X, Z, Y, beta, Zp)[source]
inference_likelihood(kern, X, Z, likelihood, Y, Zp)[source]

The first phase of inference: Compute: log-likelihood, dL_dKmm

Cached intermediate results: Kmm, KmmInv,

inference_minibatch(kern, X, Z, likelihood, Y, Zp)[source]

The second phase of inference: Computing the derivatives over a minibatch of Y Compute: dL_dpsi0, dL_dpsi1, dL_dpsi2, dL_dthetaL return a flag showing whether it reached the end of Y (isEnd)

update_gradients(model, mpi_comm=None)[source]

GPy.models.input_warped_gp module

class InputWarpedGP(X, Y, kernel=None, normalizer=False, warping_function=None, warping_indices=None, Xmin=None, Xmax=None, epsilon=None)[source]

Bases: GPy.core.gp.GP

Input Warped GP

This defines a GP model that applies a warping function to the Input. By default, it uses Kumar Warping (CDF of Kumaraswamy distribution)

X : array_like, shape = (n_samples, n_features) for input data

Y : array_like, shape = (n_samples, 1) for output data

kernel : object, optional
An instance of kernel function defined in GPy.kern Default to Matern 32
warping_function : object, optional
An instance of warping function defined in GPy.util.input_warping_functions Default to KumarWarping
warping_indices : list of int, optional
An list of indices of which features in X should be warped. It is used in the Kumar warping function
normalizer : bool, optional
A bool variable indicates whether to normalize the output
Xmin : list of float, optional
The min values for every feature in X It is used in the Kumar warping function
Xmax : list of float, optional
The max values for every feature in X It is used in the Kumar warping function
epsilon : float, optional
We normalize X to [0+e, 1-e]. If not given, using the default value defined in KumarWarping function
X_untransformed : array_like, shape = (n_samples, n_features)
A copy of original input X
X_warped : array_like, shape = (n_samples, n_features)
Input data after warping
warping_function : object, optional
An instance of warping function defined in GPy.util.input_warping_functions Default to KumarWarping

Kumar warping uses the CDF of Kumaraswamy distribution. More on the Kumaraswamy distribution can be found at the wiki page: https://en.wikipedia.org/wiki/Kumaraswamy_distribution

Snoek, J.; Swersky, K.; Zemel, R. S. & Adams, R. P. Input Warping for Bayesian Optimization of Non-stationary Functions preprint arXiv:1402.0929, 2014

log_likelihood()[source]

Compute the marginal log likelihood

For input warping, just use the normal GP log likelihood

parameters_changed()[source]

Update the gradients of parameters for warping function

This method is called when having new values of parameters for warping function, kernels and other parameters in a normal GP

predict(Xnew)[source]

Prediction on the new data

Xnew : array_like, shape = (n_samples, n_features)
The test data.
mean : array_like, shape = (n_samples, output.dim)
Posterior mean at the location of Xnew
var : array_like, shape = (n_samples, 1)
Posterior variance at the location of Xnew
transform_data(X, test_data=False)[source]

Apply warping_function to some Input data

X : array_like, shape = (n_samples, n_features)

test_data: bool, optional
Default to False, should set to True when transforming test data

GPy.models.mrd module

class MRD(Ylist, input_dim, X=None, X_variance=None, initx='PCA', initz='permute', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihoods=None, name='mrd', Ynames=None, normalizer=False, stochastic=False, batchsize=10)[source]

Bases: GPy.models.bayesian_gplvm_minibatch.BayesianGPLVMMiniBatch

!WARNING: This is bleeding edge code and still in development. Functionality may change fundamentally during development!

Apply MRD to all given datasets Y in Ylist.

Y_i in [n x p_i]

If Ylist is a dictionary, the keys of the dictionary are the names, and the values are the different datasets to compare.

The samples n in the datasets need to match up, whereas the dimensionality p_d can differ.

Parameters:
  • Ylist ([array-like]) – List of datasets to apply MRD on
  • input_dim (int) – latent dimensionality
  • X (array-like) – mean of starting latent space q in [n x q]
  • X_variance (array-like) – variance of starting latent space q in [n x q]
  • initx (['concat'|'single'|'random']) –

    initialisation method for the latent space :

    • ’concat’ - PCA on concatenation of all datasets
    • ’single’ - Concatenation of PCA on datasets, respectively
    • ’random’ - Random draw from a Normal(0,1)
  • initz ('permute'|'random') – initialisation method for inducing inputs
  • num_inducing – number of inducing inputs to use
  • Z – initial inducing inputs
  • kernel ([GPy.kernels.kernels] | GPy.kernels.kernels | None (default)) – list of kernels or kernel to copy for each output
:param :class:`~GPy.inference.latent_function_inference inference_method:
InferenceMethodList of inferences, or one inference method for all

:param likelihoods likelihoods: the likelihoods to use :param str name: the name of this model :param [str] Ynames: the names for the datasets given, must be of equal length as Ylist or None :param bool|Norm normalizer: How to normalize the data? :param bool stochastic: Should this model be using stochastic gradient descent over the dimensions? :param bool|[bool] batchsize: either one batchsize for all, or one batchsize per dataset.

factorize_space(threshold=0.005, printOut=False, views=None)[source]

Given a trained MRD model, this function looks at the optimized ARD weights (lengthscales) and decides which part of the latent space is shared across views or private, according to a threshold. The threshold is applied after all weights are normalized so that the maximum value is 1.

log_likelihood()[source]

The log marginal likelihood of the model, \(p(\mathbf{y})\), this is the objective function of the model being optimised

parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', predict_kwargs={}, scatter_kwargs=None, **imshow_kwargs)[source]

see plotting.matplot_dep.dim_reduction_plots.plot_latent if predict_kwargs is None, will plot latent spaces for 0th dataset (and kernel), otherwise give predict_kwargs=dict(Yindex=’index’) for plotting only the latent space of dataset with ‘index’.

plot_scales(titles=None, fig_kwargs={}, **kwargs)[source]

Plot input sensitivity for all datasets, to see which input dimensions are significant for which dataset.

Parameters:titles – titles for axes of datasets

kwargs go into plot_ARD for each kernel.

predict(Xnew, full_cov=False, Y_metadata=None, kern=None, Yindex=0)[source]

Prediction for data set Yindex[default=0]. This predicts the output mean and variance for the dataset given in Ylist[Yindex]

GPy.models.multioutput_gp module

class MultioutputGP(X_list, Y_list, kernel_list, likelihood_list, name='multioutputgp', kernel_cross_covariances={}, inference_method=None)[source]

Bases: GPy.core.gp.GP

Gaussian process model for using observations from multiple likelihoods and different kernels :param X_list: input observations in a list for each likelihood :param Y: output observations in a list for each likelihood :param kernel_list: kernels in a list for each likelihood :param likelihood_list: likelihoods in a list :param kernel_cross_covariances: Cross covariances between different likelihoods. See class MultioutputKern for more :param inference_method: The LatentFunctionInference inference method to use for this GP

log_predictive_density(x_test, y_test, Y_metadata=None)[source]

Calculation of the log predictive density

Parameters:
  • x_test ((Nx1) array) – test locations (x_{*})
  • y_test ((Nx1) array) – test observations (y_{*})
  • Y_metadata – metadata associated with the test points
predict(Xnew, full_cov=False, Y_metadata=None, kern=None, likelihood=None, include_likelihood=True)[source]

Predict the function(s) at the new point(s) Xnew. This includes the likelihood variance added to the predicted underlying function (usually referred to as f).

In order to predict without adding in the likelihood give include_likelihood=False, or refer to self.predict_noiseless().

Parameters:
  • Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
  • full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
  • Y_metadata – metadata about the predicting point to pass to the likelihood
  • kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
  • include_likelihood (bool) – Whether or not to add likelihood noise to the predicted underlying latent function f.
Returns:

(mean, var): mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False,

Nnew x Nnew otherwise

If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.

Note: If you want the predictive quantiles (e.g. 95% confidence interval) use predict_quantiles.

predict_noiseless(Xnew, full_cov=False, Y_metadata=None, kern=None)[source]

Convenience function to predict the underlying function of the GP (often referred to as f) without adding the likelihood variance on the prediction function.

This is most likely what you want to use for your predictions.

Parameters:
  • Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
  • full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
  • Y_metadata – metadata about the predicting point to pass to the likelihood
  • kern – The kernel to use for prediction (defaults to the model kern). this is useful for examining e.g. subprocesses.
Returns:

(mean, var):

mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False, Nnew x Nnew otherwise

If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.

Note: If you want the predictive quantiles (e.g. 95% confidence interval) use predict_quantiles.

predict_quantiles(X, quantiles=(2.5, 97.5), Y_metadata=None, kern=None, likelihood=None)[source]

Get the predictive quantiles around the prediction at X

Parameters:
  • X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
  • quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
  • kern – optional kernel to use for prediction
Returns:

list of quantiles for each X and predictive quantiles for interval combination

Return type:

[np.ndarray (Xnew x self.output_dim), np.ndarray (Xnew x self.output_dim)]

predictive_gradients(Xnew, kern=None)[source]

Compute the derivatives of the predicted latent function with respect to X* Given a set of points at which to predict X* (size [N*,Q]), compute the derivatives of the mean and variance. Resulting arrays are sized:

dmu_dX* – [N*, Q ,D], where D is the number of output in this GP (usually one).
Note that this is not the same as computing the mean and variance of the derivative of the function!
dv_dX* – [N*, Q], (since all outputs have the same variance)
Parameters:X (np.ndarray (Xnew x self.input_dim)) – The points at which to get the predictive gradients
Returns:dmu_dX, dv_dX
Return type:[np.ndarray (N*, Q ,D), np.ndarray (N*,Q) ]
set_XY(X=None, Y=None)[source]

Set the input / output data of the model This is useful if we wish to change our existing data but maintain the same model

Parameters:
  • X (np.ndarray) – input observations
  • Y (np.ndarray) – output observations

GPy.models.one_vs_all_classification module

class OneVsAllClassification(X, Y, kernel=None, Y_metadata=None, messages=True)[source]

Bases: object

Gaussian Process classification: One vs all

This is a thin wrapper around the models.GPClassification class, with a set of sensible defaults

Parameters:
  • X – input observations
  • Y – observed values, can be None if likelihood is not None
  • kernel – a GPy kernel, defaults to rbf

Note

Multiple independent outputs are not allowed

GPy.models.one_vs_all_sparse_classification module

class OneVsAllSparseClassification(X, Y, kernel=None, Y_metadata=None, messages=True, num_inducing=10)[source]

Bases: object

Gaussian Process classification: One vs all

This is a thin wrapper around the models.GPClassification class, with a set of sensible defaults

Parameters:
  • X – input observations
  • Y – observed values, can be None if likelihood is not None
  • kernel – a GPy kernel, defaults to rbf

Note

Multiple independent outputs are not allowed

GPy.models.sparse_gp_classification module

class SparseGPClassification(X, Y=None, likelihood=None, kernel=None, Z=None, num_inducing=10, Y_metadata=None, mean_function=None, inference_method=None, normalizer=False)[source]

Bases: GPy.core.sparse_gp.SparseGP

Sparse Gaussian Process model for classification

This is a thin wrapper around the sparse_GP class, with a set of sensible defaults

Parameters:
  • X – input observations
  • Y – observed values
  • likelihood – a GPy likelihood, defaults to Bernoulli
  • kernel – a GPy kernel, defaults to rbf+white
  • inference_method (GPy.inference.latent_function_inference.LatentFunctionInference) – Latent function inference to use, defaults to EPDTC
  • normalize_X (False|True) – whether to normalize the input data before computing (predictions will be in original scales)
  • normalize_Y (False|True) – whether to normalize the input data before computing (predictions will be in original scales)
Return type:

model object

static from_dict(input_dict, data=None)[source]

Instantiate an SparseGPClassification object using the information in input_dict (built by the to_dict method).

Parameters:data (tuple(np.ndarray, np.ndarray)) – It is used to provide X and Y for the case when the model was saved using save_data=False in to_dict method.
static from_sparse_gp(sparse_gp)[source]
save_model(output_filename, compress=True, save_data=True)[source]

Method to serialize the model.

Parameters:
  • output_filename (string) – Output file
  • compress (boolean) – If true compress the file using zip
  • save_data (boolean) – if true, it serializes the training data (self.X and self.Y)
to_dict(save_data=True)[source]

Store the object into a json serializable dictionary

Parameters:save_data (boolean) – if true, it adds the data self.X and self.Y to the dictionary
Return dict:json serializable dictionary containing the needed information to instantiate the object
class SparseGPClassificationUncertainInput(X, X_variance, Y, kernel=None, Z=None, num_inducing=10, Y_metadata=None, normalizer=None)[source]

Bases: GPy.core.sparse_gp.SparseGP

Sparse Gaussian Process model for classification with uncertain inputs.

This is a thin wrapper around the sparse_GP class, with a set of sensible defaults

Parameters:
  • X (np.ndarray (num_data x input_dim)) – input observations
  • X_variance (np.ndarray (num_data x input_dim)) – The uncertainty in the measurements of X (Gaussian variance, optional)
  • Y – observed values
  • kernel – a GPy kernel, defaults to rbf+white
  • Z (np.ndarray (num_inducing x input_dim) | None) – inducing inputs (optional, see note)
  • num_inducing (int) – number of inducing points (ignored if Z is passed, see note)
Return type:

model object

Note

If no Z array is passed, num_inducing (default 10) points are selected from the data. Other wise num_inducing is ignored

Note

Multiple independent outputs are allowed using columns of Y

parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.sparse_gp_coregionalized_regression module

class SparseGPCoregionalizedRegression(X_list, Y_list, Z_list=[], kernel=None, likelihoods_list=None, num_inducing=10, X_variance=None, name='SGPCR', W_rank=1, kernel_name='coreg')[source]

Bases: GPy.core.sparse_gp.SparseGP

Sparse Gaussian Process model for heteroscedastic multioutput regression

This is a thin wrapper around the SparseGP class, with a set of sensible defaults

Parameters:
  • X_list (list of numpy arrays) – list of input observations corresponding to each output
  • Y_list (list of numpy arrays) – list of observed values related to the different noise models
  • Z_list (empty list | list of numpy arrays) – list of inducing inputs (optional)
  • kernel (None | GPy.kernel defaults) – a GPy kernel ** Coregionalized, defaults to RBF ** Coregionalized
  • num_inducing (integer | list of integers) – number of inducing inputs, defaults to 10 per output (ignored if Z_list is not empty)
  • name (string) – model name
  • W_rank (integer) – number tuples of the corregionalization parameters ‘W’ (see coregionalize kernel documentation)
  • kernel_name (string) – name of the kernel
Likelihoods_list:
 

a list of likelihoods, defaults to list of Gaussian likelihoods

GPy.models.sparse_gp_minibatch module

class SparseGPMiniBatch(X, Y, Z, kernel, likelihood, inference_method=None, name='sparse gp', Y_metadata=None, normalizer=False, missing_data=False, stochastic=False, batchsize=1)[source]

Bases: GPy.core.sparse_gp.SparseGP

A general purpose Sparse GP model, allowing missing data and stochastics across dimensions.

This model allows (approximate) inference using variational DTC or FITC (Gaussian likelihoods) as well as non-conjugate sparse methods based on these.

Parameters:
  • X (np.ndarray (num_data x input_dim)) – inputs
  • likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data
  • kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels
  • X_variance (np.ndarray (num_data x input_dim) | None) – The uncertainty in the measurements of X (Gaussian variance)
  • Z (np.ndarray (num_inducing x input_dim)) – inducing inputs
  • num_inducing (int) – Number of inducing points (optional, default 10. Ignored if Z is not None)
has_uncertain_inputs()[source]
optimize(optimizer=None, start=None, **kwargs)[source]

Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors. kwargs are passed to the optimizer. They can be:

Parameters:
  • max_iters (int) – maximum number of function evaluations
  • messages (bool) – whether to display during optimisation
  • optimizer (string) – which optimizer to use (defaults to self.preferred optimizer), a range of optimisers can be found in :module:`~GPy.inference.optimization`, they include ‘scg’, ‘lbfgs’, ‘tnc’.
  • ipython_notebook (bool) – whether to use ipython notebook widgets or not.
  • clear_after_finish (bool) – if in ipython notebook, we can clear the widgets after optimization.
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.sparse_gp_regression module

class SparseGPRegression(X, Y, kernel=None, Z=None, num_inducing=10, X_variance=None, mean_function=None, normalizer=None, mpi_comm=None, name='sparse_gp')[source]

Bases: GPy.core.sparse_gp_mpi.SparseGP_MPI

Gaussian Process model for regression

This is a thin wrapper around the SparseGP class, with a set of sensible defalts

Parameters:
  • X – input observations
  • X_variance – input uncertainties, one per input X
  • Y – observed values
  • kernel – a GPy kernel, defaults to rbf+white
  • Z (np.ndarray (num_inducing x input_dim) | None) – inducing inputs (optional, see note)
  • num_inducing (int) – number of inducing points (ignored if Z is passed, see note)
Return type:

model object

Note

If no Z array is passed, num_inducing (default 10) points are selected from the data. Other wise num_inducing is ignored

Note

Multiple independent outputs are allowed using columns of Y

parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.sparse_gp_regression_md module

class SparseGPRegressionMD(X, Y, indexD, kernel=None, Z=None, num_inducing=10, normalizer=None, mpi_comm=None, individual_Y_noise=False, name='sparse_gp')[source]

Bases: GPy.core.sparse_gp_mpi.SparseGP_MPI

Sparse Gaussian Process Regression with Missing Data

This model targets at the use case, in which there are multiple output dimensions (different dimensions are assumed to be independent following the same GP prior) and each output dimension is observed at a different set of inputs. The model takes a different data format: the inputs and outputs observations of all the output dimensions are stacked together correspondingly into two matrices. An extra array is used to indicate the index of output dimension for each data point. The output dimensions are indexed using integers from 0 to D-1 assuming there are D output dimensions.

Parameters:
  • X (numpy.ndarray) – input observations.
  • Y (numpy.ndarray) – output observations, each column corresponding to an output dimension.
  • indexD (numpy.ndarray) – the array containing the index of output dimension for each data point
  • kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF **
  • Z (numpy.ndarray or None) – inducing inputs
  • num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space.
  • individual_Y_noise (boolean) – whether individual output dimensions have their own noise variance or not, boolean
  • name (str) – the name of the model
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

GPy.models.sparse_gplvm module

class SparseGPLVM(Y, input_dim, X=None, kernel=None, init='PCA', num_inducing=10)[source]

Bases: GPy.models.sparse_gp_regression.SparseGPRegression

Sparse Gaussian Process Latent Variable Model

Parameters:
  • Y (np.ndarray) – observed data
  • input_dim (int) – latent dimensionality
  • init ('PCA'|'random') – initialisation method for the latent space
parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot_latent(labels=None, which_indices=None, resolution=50, ax=None, marker='o', s=40, fignum=None, plot_inducing=True, legend=True, plot_limits=None, aspect='auto', updates=False, predict_kwargs={}, imshow_kwargs={})[source]

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using

GPy.models.ss_gplvm module

class IBPPosterior(means, variances, binary_prob, tau=None, sharedX=False, name='latent space')[source]

Bases: GPy.core.parameterization.variational.SpikeAndSlabPosterior

The SpikeAndSlab distribution for variational approximations.

binary_prob : the probability of the distribution on the slab part.

set_gradients(grad)[source]
class IBPPrior(input_dim, alpha=2.0, name='IBPPrior', **kw)[source]

Bases: GPy.core.parameterization.variational.VariationalPrior

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

class SLVMPosterior(means, variances, binary_prob, tau=None, name='latent space')[source]

Bases: GPy.core.parameterization.variational.SpikeAndSlabPosterior

The SpikeAndSlab distribution for variational approximations.

binary_prob : the probability of the distribution on the slab part.

set_gradients(grad)[source]
class SLVMPrior(input_dim, alpha=1.0, beta=1.0, Z=None, name='SLVMPrior', **kw)[source]

Bases: GPy.core.parameterization.variational.VariationalPrior

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

class SSGPLVM(Y, input_dim, X=None, X_variance=None, Gamma=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='Spike_and_Slab GPLVM', group_spike=False, IBP=False, SLVM=False, alpha=2.0, beta=2.0, connM=None, tau=None, mpi_comm=None, pi=None, learnPi=False, normalizer=False, sharedX=False, variational_prior=None, **kwargs)[source]

Bases: GPy.core.sparse_gp_mpi.SparseGP_MPI

Spike-and-Slab Gaussian Process Latent Variable Model

Parameters:
  • Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood
  • input_dim (int) – latent dimensionality
  • init ('PCA'|'random') – initialisation method for the latent space
get_X_gradients(X)[source]

Get the gradients of the posterior distribution of X in its specific form.

input_sensitivity()[source]

Returns the sensitivity for each dimension of this model

parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in the GP class this method re-performs inference, recalculating the posterior and log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters:
  • which_indices ([int]) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – marker to use [default is custom arrow like]
  • kwargs – the kwargs for the scatter plots
  • projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • scatter_kwargs – the kwargs for the scatter plots
plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – markers to use - cycle if more labels then markers are given
  • kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • annotation_kwargs – the kwargs for the annotation plot
  • scatter_kwargs – the kwargs for the scatter plots
sample_W(nSamples, raw_samples=False)[source]

Sample the loading matrix if the kernel is linear.

set_X_gradients(X, X_grad)[source]

Set the gradients of the posterior distribution of X in its specific form.

GPy.models.ss_mrd module

The Maniforld Relevance Determination model with the spike-and-slab prior

class IBPPrior_SSMRD(nModels, input_dim, alpha=2.0, tau=None, name='IBPPrior', **kw)[source]

Bases: GPy.core.parameterization.variational.VariationalPrior

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

class SSMRD(Ylist, input_dim, X=None, X_variance=None, Gammas=None, initx='PCA_concat', initz='permute', num_inducing=10, Zs=None, kernels=None, inference_methods=None, likelihoods=None, group_spike=True, pi=0.5, name='ss_mrd', Ynames=None, mpi_comm=None, IBP=False, alpha=2.0, taus=None)[source]

Bases: GPy.core.model.Model

log_likelihood()[source]
optimize(optimizer=None, start=None, **kwargs)[source]

Optimize the model using self.log_likelihood and self.log_likelihood_gradient, as well as self.priors.

kwargs are passed to the optimizer. They can be:

Parameters:
  • max_iters (int) – maximum number of function evaluations
  • optimizer (string) – which optimizer to use (defaults to self.preferred optimizer)
Messages:

True: Display messages during optimisation, “ipython_notebook”:

Valid optimizers are:
  • ‘scg’: scaled conjugate gradient method, recommended for stability.
    See also GPy.inference.optimization.scg
  • ‘fmin_tnc’: truncated Newton method (see scipy.optimize.fmin_tnc)
  • ‘simplex’: the Nelder-Mead simplex method (see scipy.optimize.fmin),
  • ‘lbfgsb’: the l-bfgs-b method (see scipy.optimize.fmin_l_bfgs_b),
  • ‘lbfgs’: the bfgs method (see scipy.optimize.fmin_bfgs),
  • ‘sgd’: stochastic gradient decsent (see scipy.optimize.sgd). For experts only!
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

optimizer_array

Array for the optimizer to work on. This array always lives in the space for the optimizer. Thus, it is untransformed, going from Transformations.

Setting this array, will make sure the transformed parameters for this model will be set accordingly. It has to be set with an array, retrieved from this method, as e.g. fixing will resize the array.

The optimizer should only interfere with this array, such that transformations are secured.

class SpikeAndSlabPrior_SSMRD(nModels, pi=0.5, learnPi=False, group_spike=True, variance=1.0, name='SSMRDPrior', **kw)[source]

Bases: GPy.core.parameterization.variational.SpikeAndSlabPrior

KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

updates the gradients for mean and variance in place

GPy.models.state_space module

GPy.models.state_space_cython module

GPy.models.state_space_main module

Main functionality for state-space inference.

class AddMethodToClass(func=None, tp='staticmethod')[source]

Bases: object

func: function to add tp: string Type of the method: normal, staticmethod, classmethod

class ContDescrStateSpace[source]

Bases: GPy.models.state_space_main.DescreteStateSpace

Class for continuous-discrete Kalman filter. State equation is continuous while measurement equation is discrete.

d x(t)/ dt = F x(t) + L q; where q~ N(0, Qc) y_{t_k} = H_{k} x_{t_k} + r_{k}; r_{k-1} ~ N(0, R_{k})
class AQcompute_batch_Python(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]

Bases: GPy.models.state_space_main.Q_handling_Python

Class for calculating matrices A, Q, dA, dQ of the discrete Kalman Filter from the matrices F, L, Qc, P_ing, dF, dQc, dP_inf of the continuos state equation. dt - time steps.

It has the same interface as AQcompute_once.

It computes matrices for all time steps. This object is used when there are not so many (controlled by internal variable) different time steps and storing all the matrices do not take too much memory.

Since all the matrices are computed all together, this object can be used in smoother without repeating the computations.

Constructor. All necessary parameters are passed here and stored in the opject.

F, L, Qc, P_inf : matrices
Parameters of corresponding continuous state model
dt: array
All time steps
compute_derivatives: bool
Whether to calculate derivatives
dP_inf, dF, dQc: 3D array
Derivatives if they are required

Nothing

Ak(k, m, P)[source]

function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.

k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
Q_inverse(k, p_largest_cond_num, p_regularization_type)[source]

Function inverts Q matrix and regularizes the inverse. Regularization is useful when original matrix is badly conditioned. Function is currently used only in SparseGP code.

k: int Iteration number.

p_largest_cond_num: float Largest condition value for the inverted matrix. If cond. number is smaller than that no regularization happen.

regularization_type: 1 or 2 Regularization type.

regularization_type: int (1 or 2)

type 1: 1/(S[k] + regularizer) regularizer is computed type 2: S[k]/(S^2[k] + regularizer) regularizer is computed
Q_srk(k)[source]

Square root of the noise matrix Q

Qk(k)[source]

function (k). Returns noise matrix of dynamic model on iteration k. k (iteration number). starts at 0

dAk(k)[source]

function (k). Returns the derivative of A on iteration k. k (iteration number). starts at 0

dQk(k)[source]

function (k). Returns the derivative of Q on iteration k. k (iteration number). starts at 0

f_a(k, m, A)[source]

Dynamic model

reset(compute_derivatives=False)[source]

For reusing this object e.g. in smoother computation. It makes sence because necessary matrices have been already computed for all time steps.

return_last()[source]

Function returns last available matrices.

class AQcompute_once(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]

Bases: GPy.models.state_space_main.Q_handling_Python

Class for calculating matrices A, Q, dA, dQ of the discrete Kalman Filter from the matrices F, L, Qc, P_ing, dF, dQc, dP_inf of the continuos state equation. dt - time steps.

It has the same interface as AQcompute_batch.

It computes matrices for only one time step. This object is used when there are many different time steps and storing matrices for each of them would take too much memory.

Constructor. All necessary parameters are passed here and stored in the opject.

F, L, Qc, P_inf : matrices
Parameters of corresponding continuous state model
dt: array
All time steps
compute_derivatives: bool
Whether to calculate derivatives
dP_inf, dF, dQc: 3D array
Derivatives if they are required

Nothing

Ak(k, m, P)[source]

function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.

k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
Q_inverse(k, p_largest_cond_num, p_regularization_type)[source]

Function inverts Q matrix and regularizes the inverse. Regularization is useful when original matrix is badly conditioned. Function is currently used only in SparseGP code.

k: int Iteration number.

p_largest_cond_num: float Largest condition value for the inverted matrix. If cond. number is smaller than that no regularization happen.

regularization_type: 1 or 2 Regularization type.

regularization_type: int (1 or 2)

type 1: 1/(S[k] + regularizer) regularizer is computed type 2: S[k]/(S^2[k] + regularizer) regularizer is computed
Q_srk(k)[source]

Check square root, maybe rewriting for Spectral decomposition is needed. Square root of the noise matrix Q

Qk(k)[source]

function (k). Returns noise matrix of dynamic model on iteration k. k (iteration number). starts at 0

dAk(k)[source]

function (k). Returns the derivative of A on iteration k. k (iteration number). starts at 0

dQk(k)[source]

function (k). Returns the derivative of Q on iteration k. k (iteration number). starts at 0

f_a(k, m, A)[source]

Dynamic model

reset(compute_derivatives)[source]

For reusing this object e.g. in smoother computation. Actually, this object can not be reused because it computes the matrices on every iteration. But this method is written for keeping the same interface with the class AQcompute_batch.

return_last()[source]

Function returns last computed matrices.

classmethod cont_discr_kalman_filter(F, L, Qc, p_H, p_R, P_inf, X, Y, index=None, m_init=None, P_init=None, p_kalman_filter_type='regular', calc_log_likelihood=False, calc_grad_log_likelihood=False, grad_params_no=0, grad_calc_params=None)[source]

This function implements the continuous-discrete Kalman Filter algorithm These notations for the State-Space model are assumed:

d/dt x(t) = F * x(t) + L * w(t); w(t) ~ N(0, Qc) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated filter distributions x_{k} ~ N(m_{k}, P(k))

1) The function generaly do not modify the passed parameters. If it happens then it is an error. There are several exeprions: scalars can be modified into a matrix, in some rare cases shapes of the derivatives matrices may be changed, it is ignored for now.

2) Copies of F,L,Qc are created in memory because they may be used later in smoother. References to copies are kept in “AQcomp” object return parameter.

3) Function support “multiple time series mode” which means that exactly the same State-Space model is used to filter several sets of measurements. In this case third dimension of Y should include these state-space measurements Log_likelihood and Grad_log_likelihood have the corresponding dimensions then.

4) Calculation of Grad_log_likelihood is not supported if matrices H, or R changes overf time (with index k). (later may be changed)

5) Measurement may include missing values. In this case update step is not done for this measurement. (later may be changed)

F: (state_dim, state_dim) matrix
F in the model.
L: (state_dim, noise_dim) matrix
L in the model.
Qc: (noise_dim, noise_dim) matrix
Q_c in the model.
p_H: scalar, matrix (measurement_dim, state_dim) , 3D array
H_{k} in the model. If matrix then H_{k} = H - constant. If it is 3D array then H_{k} = p_Q[:,:, index[2,k]]
p_R: scalar, square symmetric matrix, 3D array
R_{k} in the model. If matrix then R_{k} = R - constant. If it is 3D array then R_{k} = p_R[:,:, index[3,k]]
P_inf: (state_dim, state_dim) matrix
State varince matrix on infinity.
X: 1D array
Time points of measurements. Needed for converting continuos problem to the discrete one.
Y: matrix or vector or 3D array
Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. If 3D array then third dimension correspond to “multiple time series mode”.
index: vector
Which indices (on 3-rd dimension) from arrays p_H, p_R to use on every time step. If this parameter is None then it is assumed that p_H, p_R do not change over time and indices are not needed. index[0,:] - correspond to H, index[1,:] - correspond to R If index.shape[0] == 1, it is assumed that indides for all matrices are the same.
m_init: vector or matrix
Initial distribution mean. If None it is assumed to be zero. For “multiple time series mode” it is matrix, second dimension of which correspond to different time series. In regular case (“one time series mode”) it is a vector.
P_init: square symmetric matrix or scalar
Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead. “multiple time series mode” does not affect it, since it does not affect anything related to state variaces.
p_kalman_filter_type: string, one of (‘regular’, ‘svd’)
Which Kalman Filter is used. Regular or SVD. SVD is more numerically stable, in particular, Covariace matrices are guarantied to be positive semi-definite. However, ‘svd’ works slower, especially for small data due to SVD call overhead.
calc_log_likelihood: boolean
Whether to calculate marginal likelihood of the state-space model.
calc_grad_log_likelihood: boolean
Whether to calculate gradient of the marginal likelihood of the state-space model. If true then “grad_calc_params” parameter must provide the extra parameters for gradient calculation.
grad_params_no: int
If previous parameter is true, then this parameters gives the total number of parameters in the gradient.
grad_calc_params: dictionary
Dictionary with derivatives of model matrices with respect to parameters “dF”, “dL”, “dQc”, “dH”, “dR”, “dm_init”, “dP_init”. They can be None, in this case zero matrices (no dependence on parameters) is assumed. If there is only one parameter then third dimension is automatically added.
M: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Filter estimates of the state means. In the extra step the initial value is included. In the “multiple time series mode” third dimension correspond to different timeseries.
P: (no_steps+1, state_dim, state_dim) 3D array
Filter estimates of the state covariances. In the extra step the initial value is included.

log_likelihood: double or (1, time_series_no) 3D array.

If the parameter calc_log_likelihood was set to true, return logarithm of marginal likelihood of the state-space model. If the parameter was false, return None. In the “multiple time series mode” it is a vector providing log_likelihood for each time series.
grad_log_likelihood: column vector or (grad_params_no, time_series_no) matrix
If calc_grad_log_likelihood is true, return gradient of log likelihood with respect to parameters. It returns it column wise, so in “multiple time series mode” gradients for each time series is in the corresponding column.
AQcomp: object
Contains some pre-computed values for converting continuos model into discrete one. It can be used later in the smoothing pahse.
classmethod cont_discr_rts_smoother(state_dim, filter_means, filter_covars, p_dynamic_callables=None, X=None, F=None, L=None, Qc=None)[source]

Continuos-discrete Rauch–Tung–Striebel(RTS) smoother.

This function implements Rauch–Tung–Striebel(RTS) smoother algorithm based on the results of _cont_discr_kalman_filter_raw.

Model:
d/dt x(t) = F * x(t) + L * w(t); w(t) ~ N(0, Qc) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated smoother distributions x_{k} ~ N(m_{k}, P(k))

filter_means: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Results of the Kalman Filter means estimation.
filter_covars: (no_steps+1, state_dim, state_dim) 3D array
Results of the Kalman Filter covariance estimation.
Dynamic_callables: object or None
Object form the filter phase which provides functions for computing A, Q, dA, dQ fro discrete model from the continuos model.
X, F, L, Qc: matrices
If AQcomp is None, these matrices are used to create this object from scratch.
M: (no_steps+1,state_dim) matrix
Smoothed estimates of the state means
P: (no_steps+1,state_dim, state_dim) 3D array
Smoothed estimates of the state covariances
static lti_sde_to_descrete(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]

Linear Time-Invariant Stochastic Differential Equation (LTI SDE):

dx(t) = F x(t) dt + L d eta ,where

x(t): (vector) stochastic process eta: (vector) Brownian motion process F, L: (time invariant) matrices of corresponding dimensions Qc: covariance of noise.

This function rewrites it into the corresponding state-space form:

x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1})

TODO: this function can be redone to “preprocess dataset”, when close time points are handeled properly (with rounding parameter) and values are averaged accordingly.

F,L: LTI SDE matrices of corresponding dimensions

Qc: matrix (n,n)
Covarince between different dimensions of noise eta. n is the dimensionality of the noise.
dt: double or iterable
Time difference used on this iteration. If dt is iterable, then A and Q_noise are computed for every unique dt
compute_derivatives: boolean
Whether derivatives of A and Q are required.
grad_params_no: int
Number of gradient parameters

P_inf: (state_dim. state_dim) matrix

dP_inf

dF: 3D array
Derivatives of F
dQc: 3D array
Derivatives of Qc
dR: 3D array
Derivatives of R
A: matrix
A_{k}. Because we have LTI SDE only dt can affect on matrix difference for different k.
Q_noise: matrix
Covariance matrix of (vector) q_{k-1}. Only dt can affect the matrix difference for different k.
reconstruct_index: array
If dt was iterable return three dimensinal arrays A and Q_noise. Third dimension of these arrays correspond to unique dt’s. This reconstruct_index contain indices of the original dt’s in the uninue dt sequence. A[:,:, reconstruct_index[5]] is matrix A of 6-th(indices start from zero) dt in the original sequence.
dA: 3D array
Derivatives of A
dQ: 3D array
Derivatives of Q
class DescreteStateSpace[source]

Bases: object

This class implents state-space inference for linear and non-linear state-space models. Linear models are: x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Nonlinear: x_{k} = f_a(k, x_{k-1}, A_{k}) + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = f_h(k, x_{k}, H_{k}) + r_{k}; r_{k-1} ~ N(0, R_{k}) Here f_a and f_h are some functions of k (iteration number), x_{k-1} or x_{k} (state value on certain iteration), A_{k} and H_{k} - Jacobian matrices of f_a and f_h respectively. In the linear case they are exactly A_{k} and H_{k}.

Currently two nonlinear Gaussian filter algorithms are implemented: Extended Kalman Filter (EKF), Statistically linearized Filter (SLF), which implementations are very similar.

classmethod extended_kalman_filter(p_state_dim, p_a, p_f_A, p_f_Q, p_h, p_f_H, p_f_R, Y, m_init=None, P_init=None, calc_log_likelihood=False)[source]

Extended Kalman Filter

p_state_dim: integer

p_a: if None - the function from the linear model is assumed. No non-

linearity in the dynamic is assumed.

function (k, x_{k-1}, A_{k}). Dynamic function. k: (iteration number), x_{k-1}: (previous state) x_{k}: Jacobian matrices of f_a. In the linear case it is exactly A_{k}.

p_f_A: matrix - in this case function which returns this matrix is assumed.

Look at this parameter description in kalman_filter function.

function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.

k: (iteration number), m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.

p_f_Q: matrix. In this case function which returns this matrix is asumed.

Look at this parameter description in kalman_filter function.

function (k). Returns noise matrix of dynamic model on iteration k. k: (iteration number).

p_h: if None - the function from the linear measurement model is assumed.

No nonlinearity in the measurement is assumed.

function (k, x_{k}, H_{k}). Measurement function. k: (iteration number), x_{k}: (current state) H_{k}: Jacobian matrices of f_h. In the linear case it is exactly H_{k}.

p_f_H: matrix - in this case function which returns this matrix is assumed.
function (k, m, P) return Jacobian of dynamic function, it is passed into p_h. k: (iteration number), m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
p_f_R: matrix. In this case function which returns this matrix is asumed.
function (k). Returns noise matrix of measurement equation on iteration k. k: (iteration number).
Y: matrix or vector
Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. May have missing values.
p_mean: vector
Initial distribution mean. If None it is assumed to be zero
P_init: square symmetric matrix or scalar
Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead.
calc_log_likelihood: boolean
Whether to calculate marginal likelihood of the state-space model.
classmethod kalman_filter(p_A, p_Q, p_H, p_R, Y, index=None, m_init=None, P_init=None, p_kalman_filter_type='regular', calc_log_likelihood=False, calc_grad_log_likelihood=False, grad_params_no=None, grad_calc_params=None)[source]

This function implements the basic Kalman Filter algorithm These notations for the State-Space model are assumed:

x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated filter distributions x_{k} ~ N(m_{k}, P(k))

1) The function generaly do not modify the passed parameters. If it happens then it is an error. There are several exeprions: scalars can be modified into a matrix, in some rare cases shapes of the derivatives matrices may be changed, it is ignored for now.

2) Copies of p_A, p_Q, index are created in memory to be used later in smoother. References to copies are kept in “matrs_for_smoother” return parameter.

3) Function support “multiple time series mode” which means that exactly the same State-Space model is used to filter several sets of measurements. In this case third dimension of Y should include these state-space measurements Log_likelihood and Grad_log_likelihood have the corresponding dimensions then.

4) Calculation of Grad_log_likelihood is not supported if matrices A,Q, H, or R changes over time. (later may be changed)

5) Measurement may include missing values. In this case update step is not done for this measurement. (later may be changed)

p_A: scalar, square matrix, 3D array
A_{k} in the model. If matrix then A_{k} = A - constant. If it is 3D array then A_{k} = p_A[:,:, index[0,k]]
p_Q: scalar, square symmetric matrix, 3D array
Q_{k-1} in the model. If matrix then Q_{k-1} = Q - constant. If it is 3D array then Q_{k-1} = p_Q[:,:, index[1,k]]
p_H: scalar, matrix (measurement_dim, state_dim) , 3D array
H_{k} in the model. If matrix then H_{k} = H - constant. If it is 3D array then H_{k} = p_Q[:,:, index[2,k]]
p_R: scalar, square symmetric matrix, 3D array
R_{k} in the model. If matrix then R_{k} = R - constant. If it is 3D array then R_{k} = p_R[:,:, index[3,k]]
Y: matrix or vector or 3D array
Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. If 3D array then third dimension correspond to “multiple time series mode”.
index: vector
Which indices (on 3-rd dimension) from arrays p_A, p_Q,p_H, p_R to use on every time step. If this parameter is None then it is assumed that p_A, p_Q, p_H, p_R do not change over time and indices are not needed. index[0,:] - correspond to A, index[1,:] - correspond to Q index[2,:] - correspond to H, index[3,:] - correspond to R. If index.shape[0] == 1, it is assumed that indides for all matrices are the same.
m_init: vector or matrix
Initial distribution mean. If None it is assumed to be zero. For “multiple time series mode” it is matrix, second dimension of which correspond to different time series. In regular case (“one time series mode”) it is a vector.
P_init: square symmetric matrix or scalar
Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead. “multiple time series mode” does not affect it, since it does not affect anything related to state variaces.
calc_log_likelihood: boolean
Whether to calculate marginal likelihood of the state-space model.
calc_grad_log_likelihood: boolean
Whether to calculate gradient of the marginal likelihood of the state-space model. If true then “grad_calc_params” parameter must provide the extra parameters for gradient calculation.
grad_params_no: int
If previous parameter is true, then this parameters gives the total number of parameters in the gradient.
grad_calc_params: dictionary
Dictionary with derivatives of model matrices with respect to parameters “dA”, “dQ”, “dH”, “dR”, “dm_init”, “dP_init”. They can be None, in this case zero matrices (no dependence on parameters) is assumed. If there is only one parameter then third dimension is automatically added.
M: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Filter estimates of the state means. In the extra step the initial value is included. In the “multiple time series mode” third dimension correspond to different timeseries.
P: (no_steps+1, state_dim, state_dim) 3D array
Filter estimates of the state covariances. In the extra step the initial value is included.
log_likelihood: double or (1, time_series_no) 3D array.
If the parameter calc_log_likelihood was set to true, return logarithm of marginal likelihood of the state-space model. If the parameter was false, return None. In the “multiple time series mode” it is a vector providing log_likelihood for each time series.
grad_log_likelihood: column vector or (grad_params_no, time_series_no) matrix
If calc_grad_log_likelihood is true, return gradient of log likelihood with respect to parameters. It returns it column wise, so in “multiple time series mode” gradients for each time series is in the corresponding column.
matrs_for_smoother: dict
Dictionary with model functions for smoother. The intrinsic model functions are computed in this functions and they are returned to use in smoother for convenience. They are: ‘p_a’, ‘p_f_A’, ‘p_f_Q’ The dictionary contains the same fields.
classmethod rts_smoother(state_dim, p_dynamic_callables, filter_means, filter_covars)[source]

This function implements Rauch–Tung–Striebel(RTS) smoother algorithm based on the results of kalman_filter_raw. These notations are the same:

x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated smoother distributions x_{k} ~ N(m_{k}, P(k))

p_a: function (k, x_{k-1}, A_{k}). Dynamic function.
k (iteration number), starts at 0 x_{k-1} State from the previous step A_{k} Jacobian matrices of f_a. In the linear case it is exactly A_{k}.
p_f_A: function (k, m, P) return Jacobian of dynamic function, it is
passed into p_a. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
p_f_Q: function (k). Returns noise matrix of dynamic model on iteration k.
k (iteration number). starts at 0
filter_means: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Results of the Kalman Filter means estimation.
filter_covars: (no_steps+1, state_dim, state_dim) 3D array
Results of the Kalman Filter covariance estimation.
M: (no_steps+1, state_dim) matrix
Smoothed estimates of the state means
P: (no_steps+1, state_dim, state_dim) 3D array
Smoothed estimates of the state covariances
class DescreteStateSpaceMeta[source]

Bases: type

Substitute necessary methods from cython.

After thos method the class object is created

Dynamic_Callables_Class

alias of GPy.models.state_space_main.Dynamic_Callables_Python

class Dynamic_Callables_Python[source]

Bases: object

Ak(k, m, P)[source]

function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.

k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
Q_srk(k)[source]

function (k). Returns the square root of noise matrix of dynamic model on iteration k.

k (iteration number). starts at 0

This function is implemented to use SVD prediction step.

Qk(k)[source]
function (k). Returns noise matrix of dynamic model on iteration k.
k (iteration number). starts at 0
dAk(k)[source]
function (k). Returns the derivative of A on iteration k.
k (iteration number). starts at 0
dQk(k)[source]
function (k). Returns the derivative of Q on iteration k.
k (iteration number). starts at 0
f_a(k, m, A)[source]
p_a: function (k, x_{k-1}, A_{k}). Dynamic function.
k (iteration number), starts at 0 x_{k-1} State from the previous step A_{k} Jacobian matrices of f_a. In the linear case it is exactly A_{k}.
reset(compute_derivatives=False)[source]

Return the state of this object to the beginning of iteration (to k eq. 0).

Measurement_Callables_Class

alias of GPy.models.state_space_main.Measurement_Callables_Python

class Measurement_Callables_Python[source]

Bases: object

Hk(k, m_pred, P_pred)[source]
function (k, m, P) return Jacobian of measurement function, it is
passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
R_isrk(k)[source]
function (k). Returns the square root of the noise matrix of
measurement equation on iteration k. k (iteration number). starts at 0

This function is implemented to use SVD prediction step.

Rk(k)[source]
function (k). Returns noise matrix of measurement equation
on iteration k. k (iteration number). starts at 0
dHk(k)[source]
function (k). Returns the derivative of H on iteration k.
k (iteration number). starts at 0
dRk(k)[source]
function (k). Returns the derivative of R on iteration k.
k (iteration number). starts at 0
f_h(k, m_pred, Hk)[source]
function (k, x_{k}, H_{k}). Measurement function.
k (iteration number), starts at 0 x_{k} state H_{k} Jacobian matrices of f_h. In the linear case it is exactly H_{k}.
reset(compute_derivatives=False)[source]

Return the state of this object to the beginning of iteration (to k eq. 0)

Q_handling_Class

alias of GPy.models.state_space_main.Q_handling_Python

class Q_handling_Python(Q, index, Q_time_var_index, unique_Q_number, dQ=None)[source]

Bases: GPy.models.state_space_main.Dynamic_Callables_Python

R - array with noise on various steps. The result of preprocessing
the noise input.
index - for each step of Kalman filter contains the corresponding index
in the array.
R_time_var_index - another index in the array R. Computed earlier and
passed here.
unique_R_number - number of unique noise matrices below which square
roots are cached and above which they are computed each time.
dQ: 3D array[:, :, param_num]
derivative of Q. Derivative is supported only when Q do not change over time
Object which has two necessary functions:
f_R(k) inv_R_square_root(k)
Q_srk(k)[source]
function (k). Returns the square root of noise matrix of dynamic model
on iteration k.

k (iteration number). starts at 0

This function is implemented to use SVD prediction step.

Qk(k)[source]
function (k). Returns noise matrix of dynamic model on iteration k.
k (iteration number). starts at 0
dQk(k)[source]

function (k). Returns the derivative of Q on iteration k. k (iteration number). starts at 0

R_handling_Class

alias of GPy.models.state_space_main.R_handling_Python

class R_handling_Python(R, index, R_time_var_index, unique_R_number, dR=None)[source]

Bases: GPy.models.state_space_main.Measurement_Callables_Python

The calss handles noise matrix R.

R - array with noise on various steps. The result of preprocessing
the noise input.
index - for each step of Kalman filter contains the corresponding index
in the array.
R_time_var_index - another index in the array R. Computed earlier and
is passed here.
unique_R_number - number of unique noise matrices below which square
roots are cached and above which they are computed each time.
dR: 3D array[:, :, param_num]
derivative of R. Derivative is supported only when R do not change over time
Object which has two necessary functions:
f_R(k) inv_R_square_root(k)
R_isrk(k)[source]

Function returns the inverse square root of R matrix on step k.

Rk(k)[source]

function (k). Returns noise matrix of measurement equation on iteration k. k (iteration number). starts at 0

dRk(k)[source]

function (k). Returns the derivative of R on iteration k. k (iteration number). starts at 0

Std_Dynamic_Callables_Class

alias of GPy.models.state_space_main.Std_Dynamic_Callables_Python

class Std_Dynamic_Callables_Python(A, A_time_var_index, Q, index, Q_time_var_index, unique_Q_number, dA=None, dQ=None)[source]

Bases: GPy.models.state_space_main.Q_handling_Python

Ak(k, m_pred, P_pred)[source]
function (k, m, P) return Jacobian of measurement function, it is
passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
dAk(k)[source]

function (k). Returns the derivative of A on iteration k. k (iteration number). starts at 0

f_a(k, m, A)[source]

f_a: function (k, x_{k-1}, A_{k}). Dynamic function. k (iteration number), starts at 0 x_{k-1} State from the previous step A_{k} Jacobian matrices of f_a. In the linear case it is exactly A_{k}.

reset(compute_derivatives=False)[source]

Return the state of this object to the beginning of iteration (to k eq. 0)

Std_Measurement_Callables_Class

alias of GPy.models.state_space_main.Std_Measurement_Callables_Python

class Std_Measurement_Callables_Python(H, H_time_var_index, R, index, R_time_var_index, unique_R_number, dH=None, dR=None)[source]

Bases: GPy.models.state_space_main.R_handling_Python

Hk(k, m_pred, P_pred)[source]
function (k, m, P) return Jacobian of measurement function, it is
passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
dHk(k)[source]

function (k). Returns the derivative of H on iteration k. k (iteration number). starts at 0

f_h(k, m, H)[source]
function (k, x_{k}, H_{k}). Measurement function.
k (iteration number), starts at 0 x_{k} state H_{k} Jacobian matrices of f_h. In the linear case it is exactly H_{k}.
class Struct[source]

Bases: object

balance_matrix(A)[source]

Balance matrix, i.e. finds such similarity transformation of the original matrix A: A = T * bA * T^{-1}, where norms of columns of bA and of rows of bA are as close as possible. It is usually used as a preprocessing step in eigenvalue calculation routine. It is useful also for State-Space models.

See also:
[1] Beresford N. Parlett and Christian Reinsch (1969). Balancing
a matrix for calculation of eigenvalues and eigenvectors. Numerische Mathematik, 13(4): 293-304.
A: square matrix
Matrix to be balanced
bA: matrix
Balanced matrix
T: matrix
Left part of the similarity transformation
T_inv: matrix
Right part of the similarity transformation.
balance_ss_model(F, L, Qc, H, Pinf, P0, dF=None, dQc=None, dPinf=None, dP0=None)[source]

Balances State-Space model for more numerical stability

This is based on the following:

dx/dt = F x + L w
y = H x

Let T z = x, which gives

dz/dt = inv(T) F T z + inv(T) L w
y = H T z
matrix_exponent(M)[source]

The function computes matrix exponent and handles some special cases

GPy.models.state_space_model module

class StateSpace(X, Y, kernel=None, noise_var=1.0, kalman_filter_type='regular', use_cython=False, balance=False, name='StateSpace')[source]

Bases: GPy.core.model.Model

balance: bool Whether to balance or not the model as a whole

log_likelihood()[source]
parameters_changed()[source]

Parameters have now changed

plot(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)

Convenience function for plotting the fit of a GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

If you want fine graned control use the specific plotting functions supplied in the model.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • projection ({2d|3d}) – plot in 2d or 3d?
  • legend (bool) – convenience, whether to put a legend on the plot or not.
plot_confidence(lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_data(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)
Plot the training data
  • For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_data_error(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)

Plot the training data input error.

For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_density(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_errorbars_trainset(which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)

Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.

This also works for heteroscedastic likelihoods.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols – when the data has several columns (independant outputs), only plot these
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_f(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_latent(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_mean(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – for 2D plotting, the number of contour levels to use is
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_noiseless(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_samples(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • levels (int) – for 2D plotting, the number of contour levels to use is
predict(Xnew=None, filteronly=False, include_likelihood=True, balance=None, **kw)[source]

balance: bool Whether to balance or not the model as a whole

predict_quantiles(Xnew=None, quantiles=(2.5, 97.5), balance=None, **kw)[source]

balance: bool Whether to balance or not the model as a whole

GPy.models.state_space_setup module

This module is intended for the setup of state_space_main module. The need of this module appeared because of the way state_space_main module connected with cython code.

GPy.models.tp_regression module

class TPRegression(X, Y, kernel=None, deg_free=5.0, normalizer=None, mean_function=None, name='TP regression')[source]

Bases: GPy.core.model.Model

Student-t Process model for regression, as presented in

Shah, A., Wilson, A. and Ghahramani, Z., 2014, April. Student-t processes as alternatives to Gaussian processes. In Artificial Intelligence and Statistics (pp. 877-885).
Parameters:
  • X – input observations
  • Y – observed values
  • kernel – a GPy kernel, defaults to rbf
  • deg_free – initial value for the degrees of freedom hyperparameter
  • normalizer (Norm) –

    [False]

    Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)

Note

Multiple independent outputs are allowed using columns of Y

log_likelihood()[source]

The log marginal likelihood of the model, \(p(\mathbf{y})\), this is the objective function of the model being optimised

parameters_changed()[source]

Method that is called upon any changes to Param variables within the model. In particular in this class this method re-performs inference, recalculating the posterior, log marginal likelihood and gradients of the model

Warning

This method is not designed to be called manually, the framework is set up to automatically call this method upon changes to parameters, if you call this method yourself, there may be unexpected consequences.

plot(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)

Convenience function for plotting the fit of a GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

If you want fine graned control use the specific plotting functions supplied in the model.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • projection ({2d|3d}) – plot in 2d or 3d?
  • legend (bool) – convenience, whether to put a legend on the plot or not.
plot_confidence(lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_data(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)
Plot the training data
  • For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_data_error(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)

Plot the training data input error.

For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_density(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_errorbars_trainset(which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)

Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.

This also works for heteroscedastic likelihoods.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols – when the data has several columns (independant outputs), only plot these
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_f(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_latent(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_mean(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – for 2D plotting, the number of contour levels to use is
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_noiseless(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_samples(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • levels (int) – for 2D plotting, the number of contour levels to use is
posterior_samples(X, size=10, full_cov=False, Y_metadata=None, likelihood=None, **predict_kwargs)[source]

Samples the posterior GP at the points X, equivalent to posterior_samples_f due to the absence of a likelihood.

posterior_samples_f(X, size=10, full_cov=True, **predict_kwargs)[source]

Samples the posterior TP at the points X.

Parameters:
  • X (np.ndarray (Nnew x self.input_dim)) – The points at which to take the samples.
  • size (int.) – the number of a posteriori samples.
  • full_cov (bool.) – whether to return the full covariance matrix, or just the diagonal.
Returns:

fsim: set of simulations

Return type:

np.ndarray (D x N x samples) (if D==1 we flatten out the first dimension)

predict(Xnew, full_cov=False, kern=None, **kwargs)[source]

Predict the function(s) at the new point(s) Xnew. For Student-t processes, this method is equivalent to predict_noiseless as no likelihood is included in the model.

predict_noiseless(Xnew, full_cov=False, kern=None)[source]

Predict the underlying function f at the new point(s) Xnew.

Parameters:
  • Xnew (np.ndarray (Nnew x self.input_dim)) – The points at which to make a prediction
  • full_cov (bool) – whether to return the full covariance matrix, or just the diagonal
  • kern – The kernel to use for prediction (defaults to the model kern).
Returns:

(mean, var):

mean: posterior mean, a Numpy array, Nnew x self.input_dim var: posterior variance, a Numpy array, Nnew x 1 if full_cov=False, Nnew x Nnew otherwise

If full_cov and self.input_dim > 1, the return shape of var is Nnew x Nnew x self.input_dim. If self.input_dim == 1, the return shape is Nnew x Nnew. This is to allow for different normalizations of the output dimensions.

predict_quantiles(X, quantiles=(2.5, 97.5), kern=None, **kwargs)[source]

Get the predictive quantiles around the prediction at X

Parameters:
  • X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
  • quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
  • kern – optional kernel to use for prediction
Returns:

list of quantiles for each X and predictive quantiles for interval combination

Return type:

[np.ndarray (Xnew x self.output_dim), np.ndarray (Xnew x self.output_dim)]

set_X(X)[source]

Set the input data of the model

Parameters:X (np.ndarray) – input observations
set_XY(X, Y)[source]

Set the input / output data of the model This is useful if we wish to change our existing data but maintain the same model

Parameters:
  • X (np.ndarray) – input observations
  • Y (np.ndarray or ObsAr) – output observations
set_Y(Y)[source]

Set the output data of the model

Parameters:Y (np.ndarray or ObsArray) – output observations

GPy.models.warped_gp module

class WarpedGP(X, Y, kernel=None, warping_function=None, warping_terms=3, normalizer=False)[source]

Bases: GPy.core.gp.GP

This defines a GP Regression model that applies a warping function to the output.

log_likelihood()[source]

Notice we add the jacobian of the warping function here.

log_predictive_density(x_test, y_test, Y_metadata=None)[source]

Calculation of the log predictive density. Notice we add the jacobian of the warping function here.

Parameters:
  • x_test ((Nx1) array) – test locations (x_{*})
  • y_test ((Nx1) array) – test observations (y_{*})
  • Y_metadata – metadata associated with the test points
parameters_changed()[source]

Notice that we update the warping function gradients here.

plot_warping()[source]
predict(Xnew, kern=None, pred_init=None, Y_metadata=None, median=False, deg_gauss_hermite=20, likelihood=None)[source]

Prediction results depend on: - The value of the self.predict_in_warped_space flag - The median flag passed as argument The likelihood keyword is never used, it is just to follow the plotting API.

predict_quantiles(X, quantiles=(2.5, 97.5), Y_metadata=None, likelihood=None, kern=None)[source]

Get the predictive quantiles around the prediction at X

Parameters:
  • X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction
  • quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval
Returns:

list of quantiles for each X and predictive quantiles for interval combination

Return type:

[np.ndarray (Xnew x self.input_dim), np.ndarray (Xnew x self.input_dim)]

set_XY(X=None, Y=None)[source]

Set the input / output data of the model This is useful if we wish to change our existing data but maintain the same model

Parameters:
  • X (np.ndarray) – input observations
  • Y (np.ndarray) – output observations
transform_data()[source]

GPy.kern package

Introduction

In terms of Gaussian Processes, a kernel is a function that specifies the degree of similarity between variables given their relative positions in parameter space. If known variables x and x’ are close together then observed variables y and y’ may also be similar, depending on the kernel function and its parameters. Note: this may be too simple a definition for the broad range of kernels available in :py:class:`GPy`.

GPy.kern.src.kern.Kern is a generic kernel object inherited by more specific, end-user kernels used in models. It provides methods that specific kernels should generally have such as GPy.kern.src.kern.Kern.K to compute the value of the kernel, GPy.kern.src.kern.Kern.add to combine kernels and numerous functions providing information on kernel gradients.

There are several inherited types of kernel that provide a basis for specific end user kernels:

Inheritance diagram of GPy.kern.src.kern.Kern, GPy.kern.src.static, GPy.kern.src.stationary, GPy.kern.src.kern.CombinationKernel, GPy.kern.src.brownian, GPy.kern.src.linear, GPy.kern.src.standard_periodic

e.g. the archetype GPy.kern.RBF does not inherit directly from GPy.kern.src.kern.Kern, but from GPy.kern.src.stationary.

Inheritance diagram of GPy.kern.src.kern.Kern, GPy.kern.RBF

Subpackages

GPy.kern.src package
Subpackages
GPy.kern.src.psi_comp package
class PSICOMP(*a, **kw)[source]

Bases: paramz.core.pickleable.Pickleable

psiDerivativecomputations(kern, dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, qX)[source]
psicomputations(kern, Z, qX, return_psi2_n=False)[source]
class PSICOMP_Linear(*a, **kw)[source]

Bases: GPy.kern.src.psi_comp.PSICOMP

psiDerivativecomputations(kern, dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
psicomputations(kern, Z, variational_posterior, return_psi2_n=False)[source]
class PSICOMP_RBF(*a, **kw)[source]

Bases: GPy.kern.src.psi_comp.PSICOMP

psiDerivativecomputations(kern, dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
psicomputations(kern, Z, variational_posterior, return_psi2_n=False)[source]
Submodules
GPy.kern.src.psi_comp.gaussherm module

An approximated psi-statistics implementation based on Gauss-Hermite Quadrature

class PSICOMP_GH(degree=11, cache_K=True)[source]

Bases: GPy.kern.src.psi_comp.PSICOMP

comp_K(Z, qX)[source]
psiDerivativecomputations(kern, dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, qX)[source]
psicomputations(kern, Z, qX, return_psi2_n=False)[source]
GPy.kern.src.psi_comp.linear_psi_comp module

The package for the Psi statistics computation of the linear kernel for Bayesian GPLVM

psiDerivativecomputations(dL_dpsi0, dL_dpsi1, dL_dpsi2, variance, Z, variational_posterior)[source]
psicomputations(variance, Z, variational_posterior, return_psi2_n=False)[source]

Compute psi-statistics for ss-linear kernel

GPy.kern.src.psi_comp.rbf_psi_comp module

The module for psi-statistics for RBF kernel

psiDerivativecomputations(dL_dpsi0, dL_dpsi1, dL_dpsi2, variance, lengthscale, Z, variational_posterior)[source]
psicomputations(variance, lengthscale, Z, variational_posterior, return_psi2_n=False)[source]
GPy.kern.src.psi_comp.rbf_psi_gpucomp module

The module for psi-statistics for RBF kernel

class PSICOMP_RBF_GPU(threadnum=256, blocknum=30, GPU_direct=False)[source]

Bases: GPy.kern.src.psi_comp.PSICOMP_RBF

get_dimensions(Z, variational_posterior)[source]
psiDerivativecomputations(kern, dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
psicomputations(kern, Z, variational_posterior, return_psi2_n=False)[source]
reset_derivative()[source]
sync_params(lengthscale, Z, mu, S)[source]
GPy.kern.src.psi_comp.sslinear_psi_comp module

The package for the Psi statistics computation of the linear kernel for SSGPLVM

psiDerivativecomputations(dL_dpsi0, dL_dpsi1, dL_dpsi2, variance, Z, variational_posterior)[source]
psicomputations(variance, Z, variational_posterior, return_psi2_n=False)[source]

Compute psi-statistics for ss-linear kernel

GPy.kern.src.psi_comp.ssrbf_psi_comp module

The package for the psi statistics computation

psiDerivativecomputations(dL_dpsi0, dL_dpsi1, dL_dpsi2, variance, lengthscale, Z, variational_posterior)[source]
psicomputations(variance, lengthscale, Z, variational_posterior)[source]

Z - MxQ mu - NxQ S - NxQ gamma - NxQ

GPy.kern.src.psi_comp.ssrbf_psi_gpucomp module

The module for psi-statistics for RBF kernel for Spike-and-Slab GPLVM

class PSICOMP_SSRBF_GPU(threadnum=128, blocknum=15, GPU_direct=False)[source]

Bases: GPy.kern.src.psi_comp.PSICOMP_RBF

get_dimensions(Z, variational_posterior)[source]
psiDerivativecomputations(kern, dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
psicomputations(kern, Z, variational_posterior, return_psi2_n=False)[source]

Z - MxQ mu - NxQ S - NxQ

reset_derivative()[source]
sync_params(lengthscale, Z, mu, S, gamma)[source]
Submodules
GPy.kern.src.ODE_UY module
class ODE_UY(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, active_dims=None, name='ode_uy')[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.ODE_UYC module
class ODE_UYC(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, ubias=1.0, active_dims=None, name='ode_uyc')[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.ODE_st module
class ODE_st(input_dim, a=1.0, b=1.0, c=1.0, variance_Yx=3.0, variance_Yt=1.5, lengthscale_Yx=1.5, lengthscale_Yt=1.5, active_dims=None, name='ode_st')[source]

Bases: GPy.kern.src.kern.Kern

kernel resultiong from a first order ODE with OU driving GP

Parameters:
  • input_dim (int) – the number of input dimension, has to be equal to one
  • varianceU (float) – variance of the driving GP
  • lengthscaleU (float) – lengthscale of the driving GP (sqrt(3)/lengthscaleU)
  • varianceY (float) – ‘variance’ of the transfer function
  • lengthscaleY (float) – ‘lengthscale’ of the transfer function (1/lengthscaleY)
Return type:

kernel object

K(X, X2=None)[source]

Compute the covariance matrix between X and X2.

Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.ODE_t module
class ODE_t(input_dim, a=1.0, c=1.0, variance_Yt=3.0, lengthscale_Yt=1.5, ubias=1.0, active_dims=None, name='ode_st')[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the covariance matrix between X and X2.

Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.add module
class Add(subkerns, name='sum')[source]

Bases: GPy.kern.src.kern.CombinationKernel

Add given list of kernels together. propagates gradients through.

This kernel will take over the active dims of it’s subkernels passed in.

NOTE: The subkernels will be copies of the original kernels, to prevent unexpected behavior.

K(X, X2=None, which_parts=None)[source]

Add all kernels together. If a list of parts (of this kernel!) which_parts is given, only the parts of the list are taken to compute the covariance.

Kdiag(X, which_parts=None)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]

Compute the gradient of the objective function with respect to X.

Parameters:
  • dL_dK (np.ndarray (num_samples x num_inducing)) – An array of gradients of the objective function with respect to the covariance function.
  • X (np.ndarray (num_samples x input_dim)) – Observed data inputs
  • X2 (np.ndarray (num_inducing x input_dim)) – Observed data inputs (optional, defaults to X)
gradients_XX(dL_dK, X, X2)[source]
\[\]

frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}

gradients_XX_diag(dL_dKdiag, X)[source]

The diagonal of the second derivative w.r.t. X and X2

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_psi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]

If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.

psi0(Z, variational_posterior)[source]
\[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]
\[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

sde()[source]

Support adding kernels for sde representation

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients_diag(dL_dK, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.basis_funcs module
class BasisFuncKernel(input_dim, variance=1.0, active_dims=None, ARD=False, name='basis func kernel')[source]

Bases: GPy.kern.src.kern.Kern

Abstract superclass for kernels with explicit basis functions for use in GPy.

This class does NOT automatically add an offset to the design matrix phi!

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X, X2=None)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
concatenate_offset(X)[source]

Convenience function to add an offset column to phi. You can use this function to add an offset (bias on y axis) to phi in your custom self._phi(X).

parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

phi(X)[source]
posterior_inf(X=None, posterior=None)[source]

Do the posterior inference on the parameters given this kernels functions and the model posterior, which has to be a GPy posterior, usually found at m.posterior, if m is a GPy model. If not given we search for the the highest parent to be a model, containing the posterior, and for X accordingly.

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class ChangePointBasisFuncKernel(input_dim, changepoint, variance=1.0, active_dims=None, ARD=False, name='changepoint')[source]

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

The basis function has a changepoint. That is, it is constant, jumps at a single point (given as changepoint) and is constant again. You can give multiple changepoints. The changepoints are calculated using np.where(self.X < self.changepoint), -1, 1)

class DomainKernel(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='constant_domain')[source]

Bases: GPy.kern.src.basis_funcs.LinearSlopeBasisFuncKernel

Create a constant plateou of correlation between start and stop and zero elsewhere. This is a constant shift of the outputs along the yaxis in the range from start to stop.

class LinearSlopeBasisFuncKernel(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='linear_segment')[source]

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.

Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.

class LogisticBasisFuncKernel(input_dim, centers, variance=1.0, slope=1.0, active_dims=None, ARD=False, ARD_slope=True, name='logistic')[source]

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

Create a series of logistic basis functions with centers given. The slope gets computed by datafit. The number of centers determines the number of logistic functions.

parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class PolynomialBasisFuncKernel(input_dim, degree, variance=1.0, active_dims=None, ARD=True, name='polynomial_basis')[source]

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.

Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.

GPy.kern.src.brownian module
class Brownian(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]

Bases: GPy.kern.src.kern.Kern

Brownian motion in 1D only.

Negative times are treated as a separate (backwards!) Brownian motion.

Parameters:
  • input_dim (int) – the number of input dimensions
  • variance (float) –
K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
to_dict()[source]

Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent. :return dict: json serializable dictionary containing the needed information to instantiate the object

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.coregionalize module
class Coregionalize(input_dim, output_dim, rank=1, W=None, kappa=None, active_dims=None, name='coregion')[source]

Bases: GPy.kern.src.kern.Kern

Covariance function for intrinsic/linear coregionalization models

This covariance has the form:

\[\mathbf{B} = \mathbf{W}\mathbf{W}^\intercal + \mathrm{diag}(kappa)\]

An intrinsic/linear coregionalization covariance function of the form:

\[k_2(x, y)=\mathbf{B} k(x, y)\]

it is obtained as the tensor product between a covariance function k(x, y) and B.

Parameters:
  • output_dim (int) – number of outputs to coregionalize
  • rank (int) – number of columns of the W matrix (this parameter is ignored if parameter W is not None)
  • W (numpy array of dimensionality (num_outpus, W_columns)) – a low rank matrix that determines the correlations between the different outputs, together with kappa it forms the coregionalization matrix B
  • kappa (numpy array of dimensionality (output_dim, )) – a vector which allows the outputs to behave independently
K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.coregionalize_cython module
GPy.kern.src.diff_kern module
class DiffKern(base_kern, dimension)[source]

Bases: GPy.kern.src.kern.Kern

Diff kernel is a thin wrapper for using partial derivatives of kernels as kernels. Eg. in combination with Multioutput kernel this allows the user to train GPs with observations of latent function and latent function derivatives. NOTE: DiffKern only works when used with Multioutput kernel. Do not use the kernel as standalone

The parameters the kernel needs are: -‘base_kern’: a member of Kernel class that is used for observations -‘dimension’: integer that indigates in which dimensions the partial derivative observations are

K(X, X2=None, dimX2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
dK_dX2_wrap(X, X2)[source]
dK_dX_wrap(X, X2)[source]
gradients_X(dL_dK, X, X2)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X2(dL_dK, X, X2)[source]
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

reset_gradients()[source]
update_gradients_dK_dX(dL_dK, X, X2=None)[source]
update_gradients_dK_dX2(dL_dK, X, X2=None)[source]
update_gradients_diag(dL_dK_diag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None, dimX2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

gradient
GPy.kern.src.eq_ode1 module
class EQ_ODE1(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, decay=None, active_dims=None, name='eq_ode1')[source]

Bases: GPy.kern.src.kern.Kern

Covariance function for first order differential equation driven by an exponentiated quadratic covariance.

This outputs of this kernel have the form .. math:

rac{ ext{d}y_j}{ ext{d}t} = sum_{i=1}^R w_{j,i} u_i(t-delta_j) - d_jy_j(t)

where \(R\) is the rank of the system, \(w_{j,i}\) is the sensitivity of the \(j\) is the decay rate of the \(j\) are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.

param output_dim:
 number of outputs driven by latent function.
type output_dim:
 int
param W:sensitivities of each output to the latent driving function.
type W:ndarray (output_dim x rank).
param rank:If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance.
type rank:int
param decay:decay rates for the first order system.
type decay:array of length output_dim.
param delay:delay between latent force and output response.
type delay:array of length output_dim.
param kappa:diagonal term that allows each latent output to have an independent component to the response.
type kappa:array of length output_dim.
K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

lnDifErf(z1, z2)[source]
GPy.kern.src.eq_ode2 module
class EQ_ODE2(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, C=None, B=None, active_dims=None, name='eq_ode2')[source]

Bases: GPy.kern.src.kern.Kern

Covariance function for second order differential equation driven by an exponentiated quadratic covariance.

This outputs of this kernel have the form .. math:

rac{ ext{d}^2y_j(t)}{ ext{d}^2t} + C_j rac{ ext{d}y_j(t)}{ ext{d}t} + B_jy_j(t) = sum_{i=1}^R w_{j,i} u_i(t)

where \(R\) is the rank of the system, \(w_{j,i}\) is the sensitivity of the \(j\) is the decay rate of the \(j\) and \(g_i(t)\) are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.

param output_dim:
 number of outputs driven by latent function.
type output_dim:
 int
param W:sensitivities of each output to the latent driving function.
type W:ndarray (output_dim x rank).
param rank:If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance.
type rank:int
param C:damper constant for the second order system.
type C:array of length output_dim.
param B:spring constant for the second order system.
type B:array of length output_dim.
K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.grid_kerns module
class GridKern(input_dim, variance, lengthscale, ARD, active_dims, name, originalDimensions, useGPU=False)[source]

Bases: GPy.kern.src.stationary.Stationary

dKd_dLen(X, dimension, lengthscale, X2=None)[source]

Derivate of Kernel function wrt lengthscale applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

dKd_dLen(X, X2) = dKdLen_of_r((X-X2)**2)

dKd_dVar(X, X2=None)[source]

Derivative of Kernel function wrt variance applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

dKd_dVar(X, X2) = dKdVar_of_r((X-X2)**2)

class GridRBF(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='gridRBF', originalDimensions=1, useGPU=False)[source]

Bases: GPy.kern.src.grid_kerns.GridKern

Similar to regular RBF but supplemented with methods required for Gaussian grid regression Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]
K_of_r(r)[source]
dK_dr(r)[source]
dKdLen_of_r(r, dimCheck, lengthscale)[source]

Compute derivative of kernel for dimension wrt lengthscale Computation of derivative changes when lengthscale corresponds to the dimension of the kernel whose derivate is being computed.

dKdVar_of_r(r)[source]

Compute derivative of kernel wrt variance

GPy.kern.src.independent_outputs module
class Hierarchical(kernels, name='hierarchy')[source]

Bases: GPy.kern.src.kern.CombinationKernel

A kernel which can represent a simple hierarchical model.

See Hensman et al 2013, “Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters” http://www.biomedcentral.com/1471-2105/14/252

To construct this kernel, you must pass a list of kernels. the first kernel will be assumed to be the ‘base’ kernel, and will be computed everywhere. For every additional kernel, we assume another layer in the hierachy, with a corresponding column of the input matrix which indexes which function the data are in at that level.

For more, see the ipython notebook documentation on Hierarchical covariances.

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class IndependentOutputs(kernels, index_dim=-1, name='independ')[source]

Bases: GPy.kern.src.kern.CombinationKernel

A kernel which can represent several independent functions. this kernel ‘switches off’ parts of the matrix where the output indexes are different.

The index of the functions is given by the last column in the input X the rest of the columns of X are passed to the underlying kernel for computation (in blocks).

Parameters:kernels – either a kernel, or list of kernels to work with. If it is

a list of kernels the indices in the index_dim, index the kernels you gave!

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.integral module
class Integral(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]

Bases: GPy.kern.src.kern.Kern

Integral kernel between…

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

dk_dl(t, tprime, l)[source]
g(z)[source]
h(z)[source]
k_ff(t, tprime, l)[source]
k_xf(t, tprime, l)[source]
k_xx(t, tprime, l)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.integral_limits module
class Integral_Limits(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]

Bases: GPy.kern.src.kern.Kern

Integral kernel. This kernel allows 1d histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs (on two dimensions) are the start and end points of each bin. The kernel’s predictions are the latent function which might have generated those binned results.

K(X, X2=None)[source]
Note: We have a latent function and an output function. We want to be able to find:
  • the covariance between values of the output function
  • the covariance between values of the latent function
  • the “cross covariance” between values of the output function and the latent function

This method is used by GPy to either get the covariance between the outputs (K_xx) or is used to get the cross covariance (between the latent function and the outputs (K_xf). We take advantage of the places where this function is used:

  • if X2 is none, then we know that the items being compared (to get the covariance for)

are going to be both from the OUTPUT FUNCTION. - if X2 is not none, then we know that the items being compared are from two different sets (the OUTPUT FUNCTION and the LATENT FUNCTION).

If we want the covariance between values of the LATENT FUNCTION, we take advantage of the fact that we only need that when we do prediction, and this only calls Kdiag (not K). So the covariance between LATENT FUNCTIONS is available from Kdiag.

Kdiag(X)[source]

I’ve used the fact that we call this method during prediction (instead of K). When we do prediction we want to know the covariance between LATENT FUNCTIONS (K_ff) (as that’s probably what the user wants). $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

dk_dl(t, tprime, s, sprime, l)[source]
g(z)[source]
h(z)[source]
k_ff(t, tprime, l)[source]

Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required

k_xf(t, tprime, s, l)[source]

Covariance between the gradient (latent value) and the actual (observed) value.

Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.

k_xx(t, tprime, s, sprime, l)[source]

Covariance between observed values.

s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)

We’re interested in how correlated these two integrals are.

Note: We’ve not multiplied by the variance, this is done in K.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.kern module
class CombinationKernel(kernels, name, extra_dims=[], link_parameters=True)[source]

Bases: GPy.kern.src.kern.Kern

Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.

Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.

Parameters:
  • kernels (list) – List of kernels to combine (can be only one element)
  • name (str) – name of the combination kernel
  • extra_dims (array-like) – if needed extra dimensions for the combination kernel to work on
input_sensitivity(summarize=True)[source]

If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.

parts
class Kern(input_dim, active_dims, name, useGPU=False, *a, **kw)[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

The base class for a kernel: a positive definite function which forms of a covariance function (kernel).

input_dim:

is the number of dimensions to work on. Make sure to give the tight dimensionality of inputs. You most likely want this to be the integer telling the number of input dimensions of the kernel.

active_dims:

is the active_dimensions of inputs X we will work on. All kernels will get sliced Xes as inputs, if _all_dims_active is not None Only positive integers are allowed in active_dims! if active_dims is None, slicing is switched off and all X will be passed through as given.
Parameters:
  • input_dim (int) – the number of input dimensions to the function
  • active_dims (array-like|None) – list of indices on which dimensions this kernel works on, or none if no slicing

Do not instantiate.

K(X, X2)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
add(other, name='sum')[source]

Add another kernel to this one.

Parameters:other (GPy.kern) – the other kernel to be added
static from_dict(input_dict)[source]

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:input_dict (dict) – Dictionary with all the information needed to instantiate the object.
get_most_significant_input_dimensions(which_indices=None)[source]

Determine which dimensions should be plotted

Returns the top three most signification input dimensions

if less then three dimensions, the non existing dimensions are labeled as None, so for a 1 dimensional input this returns (0, None, None).

Parameters:which_indices (int or tuple(int,int) or tuple(int,int,int)) – force the indices to be the given indices.
gradients_X(dL_dK, X, X2)[source]
\[\frac{\partial L}{\partial X} = \frac{\partial L}{\partial K}\frac{\partial K}{\partial X}\]
gradients_XX(dL_dK, X, X2, cov=True)[source]
\[\frac{\partial^2 L}{\partial X\partial X_2} = \frac{\partial L}{\partial K}\frac{\partial^2 K}{\partial X\partial X_2}\]
gradients_XX_diag(dL_dKdiag, X, cov=True)[source]

The diagonal of the second derivative w.r.t. X and X2

gradients_X_X2(dL_dK, X, X2)[source]
gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior, psi0=None, psi1=None, psi2=None)[source]

Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

plot(*args, **kwargs)
plot_ARD(filtering=None, legend=False, canvas=None, **kwargs)

If an ARD kernel is present, plot a bar representation using matplotlib

Parameters:
  • fignum – figure number of the plot
  • filtering (list of names to use for ARD plot) – list of names, which to use for plotting ARD parameters. Only kernels which match names in the list of names in filtering will be used for plotting.
plot_covariance(x=None, label=None, plot_limits=None, visible_dims=None, resolution=None, projection='2d', levels=20, **kwargs)

Plot a kernel covariance w.r.t. another x.

Parameters:
  • x (array-like) – the value to use for the other kernel argument (kernels are a function of two variables!)
  • plot_limits (Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D) – the range over which to plot the kernel
  • visible_dims (array-like) – input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
  • projection ({2d|3d}) – What projection shall we use to plot the kernel?
  • levels (int) – for 2D projection, how many levels for the contour plot to use?
  • kwargs – valid kwargs for your specific plotting library
Resolution:

the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.

prod(other, name='mul')[source]

Multiply two kernels (either on the same space, or on the tensor product of the input space).

Parameters:other (GPy.kern) – the other kernel to be added
psi0(Z, variational_posterior)[source]
\[\psi_0 = \sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]\]
psi1(Z, variational_posterior)[source]
\[\psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]\]
psi2(Z, variational_posterior)[source]
\[\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]\]
psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

reset_gradients()[source]
to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.kernel_slice_operations module

Created on 11 Mar 2014

@author: @mzwiessele

This module provides a meta class for the kernels. The meta class is for slicing the inputs (X, X2) for the kernels, before K (or any other method involving X) gets calls. The _all_dims_active of a kernel decide which dimensions the kernel works on.

class KernCallsViaSlicerMeta[source]

Bases: paramz.parameterized.ParametersChangedMeta

put_clean(dct, name, func)[source]
GPy.kern.src.linear module
class Linear(input_dim, variances=None, ARD=False, active_dims=None, name='linear')[source]

Bases: GPy.kern.src.kern.Kern

Linear kernel

\[k(x,y) = \sum_{i=1}^{\text{input_dim}} \sigma^2_i x_iy_i\]
Parameters:
  • input_dim (int) – the number of input dimensions
  • variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
  • ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type:

kernel object

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_XX(dL_dK, X, X2=None)[source]

Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:

returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].

..math:

rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}

..returns:
dL2_dXdX2: [NxMxQxQ] for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None)
Thus, we return the second derivative in X2.
gradients_XX_diag(dL_dKdiag, X)[source]

The diagonal of the second derivative w.r.t. X and X2

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

psi0(Z, variational_posterior)[source]
\[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]
\[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class LinearFull(input_dim, rank, W=None, kappa=None, active_dims=None, name='linear_full')[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.mlp module
class MLP(input_dim, variance=1.0, weight_variance=1.0, bias_variance=1.0, ARD=False, active_dims=None, name='mlp')[source]

Bases: GPy.kern.src.kern.Kern

Multi layer perceptron kernel (also known as arc sine kernel or neural network kernel)

\[k(x,y) = \sigma^{2}\frac{2}{\pi } \text{asin} \left ( \frac{ \sigma_w^2 x^\top y+\sigma_b^2}{\sqrt{\sigma_w^2x^\top x + \sigma_b^2 + 1}\sqrt{\sigma_w^2 y^\top y + \sigma_b^2 +1}} \right )\]
Parameters:
  • input_dim (int) – the number of input dimensions
  • variance (float) – the variance \(\sigma^2\)
  • weight_variance (array or list of the appropriate size (or float if there is only one weight variance parameter)) – the vector of the variances of the prior over input weights in the neural network \(\sigma^2_w\)
  • bias_variance – the variance of the prior over bias parameters \(\sigma^2_b\)
  • ARD (Boolean) – Auto Relevance Determination. If equal to “False”, the kernel is isotropic (ie. one weight variance parameter sigma^2_w), otherwise there is one weight variance parameter per dimension.
Return type:

Kernpart object

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

Compute the diagonal of the covariance matrix for X.

gradients_X(dL_dK, X, X2)[source]

Derivative of the covariance matrix with respect to X

gradients_X_X2(dL_dK, X, X2)[source]

Derivative of the covariance matrix with respect to X

gradients_X_diag(dL_dKdiag, X)[source]

Gradient of diagonal of covariance with respect to X

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Derivative of the covariance with respect to the parameters.

GPy.kern.src.multidimensional_integral_limits module
class Multidimensional_Integral_Limits(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]

Bases: GPy.kern.src.kern.Kern

Integral kernel, can include limits on each integral value. This kernel allows an n-dimensional histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs are the start and end points of each bin: Pairs of inputs act as the limits on each bin. So inputs 4 and 5 provide the start and end values of each bin in the 3rd dimension. The kernel’s predictions are the latent function which might have generated those binned results.

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

calc_K_xx_wo_variance(X)[source]

Calculates K_xx without the variance term

dk_dl(t, tprime, s, sprime, l)[source]
g(z)[source]
h(z)[source]
k_ff(t, tprime, l)[source]

Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required

k_xf(t, tprime, s, l)[source]

Covariance between the gradient (latent value) and the actual (observed) value.

Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.

k_xx(t, tprime, s, sprime, l)[source]

Covariance between observed values.

s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)

We’re interested in how correlated these two integrals are.

Note: We’ve not multiplied by the variance, this is done in K.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.multioutput_derivative_kern module
class KernWrapper(fk, fug, fg, base_kern)[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

gradient
class MultioutputDerivativeKern(kernels, cross_covariances={}, name='MultioutputDerivativeKern')[source]

Bases: GPy.kern.src.multioutput_kern.MultioutputKern

Multioutput derivative kernel is a meta class for combining different kernels for multioutput GPs. Multioutput derivative kernel is only a thin wrapper for Multioutput kernel for user not having to define cross covariances.

GPy.kern.src.multioutput_kern module
class MultioutputKern(kernels, cross_covariances={}, name='MultioutputKern')[source]

Bases: GPy.kern.src.kern.CombinationKernel

Multioutput kernel is a meta class for combining different kernels for multioutput GPs.

As an example let us have inputs x1 for output 1 with covariance k1 and x2 for output 2 with covariance k2. In addition, we need to define the cross covariances k12(x1,x2) and k21(x2,x1). Then the kernel becomes: k([x1,x2],[x1,x2]) = [k1(x1,x1) k12(x1, x2); k21(x2, x1), k2(x2,x2)]

For the kernel, the kernels of outputs are given as list in param “kernels” and cross covariances are given in param “cross_covariances” as a dictionary of tuples (i,j) as keys. If no cross covariance is given, it defaults to zero, as in k12(x1,x2)=0.

In the cross covariance dictionary, the value needs to be a struct with elements -‘kernel’: a member of Kernel class that stores the hyper parameters to be updated when optimizing the GP -‘K’: function defining the cross covariance -‘update_gradients_full’: a function to be used for updating gradients -‘gradients_X’: gives a gradient of the cross covariance with respect to the first input

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

reset_gradients()[source]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class ZeroKern[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

gradient
GPy.kern.src.periodic module
class Periodic(input_dim, variance, lengthscale, period, n_freq, lower, upper, active_dims, name)[source]

Bases: GPy.kern.src.kern.Kern

Parameters:
  • variance (float) – the variance of the Matern kernel
  • lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
  • period (float) – the period
  • n_freq (int) – the number of frequencies considered for the periodic subspace
Return type:

kernel object

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
class PeriodicExponential(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_exponential')[source]

Bases: GPy.kern.src.periodic.Periodic

Kernel of the periodic subspace (up to a given frequency) of a exponential (Matern 1/2) RKHS.

Only defined for input_dim=1.

Gram_matrix()[source]
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters (shape is N x num_inducing x num_params)

class PeriodicMatern32(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern32')[source]

Bases: GPy.kern.src.periodic.Periodic

Kernel of the periodic subspace (up to a given frequency) of a Matern 3/2 RKHS. Only defined for input_dim=1.

Parameters:
  • input_dim (int) – the number of input dimensions
  • variance (float) – the variance of the Matern kernel
  • lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
  • period (float) – the period
  • n_freq (int) – the number of frequencies considered for the periodic subspace
Return type:

kernel object

Gram_matrix()[source]
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2)[source]

derivative of the covariance matrix with respect to the parameters (shape is num_data x num_inducing x num_params)

class PeriodicMatern52(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern52')[source]

Bases: GPy.kern.src.periodic.Periodic

Kernel of the periodic subspace (up to a given frequency) of a Matern 5/2 RKHS. Only defined for input_dim=1.

Parameters:
  • input_dim (int) – the number of input dimensions
  • variance (float) – the variance of the Matern kernel
  • lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
  • period (float) – the period
  • n_freq (int) – the number of frequencies considered for the periodic subspace
Return type:

kernel object

Gram_matrix()[source]
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.poly module
class Poly(input_dim, variance=1.0, scale=1.0, bias=1.0, order=3.0, active_dims=None, name='poly')[source]

Bases: GPy.kern.src.kern.Kern

Polynomial kernel

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.prod module
class Prod(kernels, name='mul')[source]

Bases: GPy.kern.src.kern.CombinationKernel

Computes the product of 2 kernels

Parameters:k2 (k1,) – the kernels to multiply
Return type:kernel object
K(X, X2=None, which_parts=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X, which_parts=None)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

input_sensitivity(summarize=True)[source]

If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.

sde()[source]
sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

dkron(A, dA, B, dB, operation='prod')[source]

Function computes the derivative of Kronecker product A*B (or Kronecker sum A+B).

A: 2D matrix
Some matrix
dA: 3D (or 2D matrix)
Derivarives of A
B: 2D matrix
Some matrix
dB: 3D (or 2D matrix)
Derivarives of B
operation: str ‘prod’ or ‘sum’
Which operation is considered. If the operation is ‘sum’ it is assumed that A and are square matrices.s
Output:
dC: 3D matrix Derivative of Kronecker product A*B (or Kronecker sum A+B)
numpy_invalid_op_as_exception(func)[source]

A decorator that allows catching numpy invalid operations as exceptions (the default behaviour is raising warnings).

GPy.kern.src.rbf module
class RBF(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='rbf', useGPU=False, inv_l=False)[source]

Bases: GPy.kern.src.stationary.Stationary

Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]
K_of_r(r)[source]
dK2_dXdX2(X, X2, dimX, dimX2)[source]
dK2_dlengthscaledX(X, X2, dimX)[source]
dK2_dlengthscaledX2(X, X2, dimX2)[source]
dK2_drdr(r)[source]
dK2_drdr_diag()[source]

Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.

dK2_dvariancedX(X, X2, dim)[source]
dK2_dvariancedX2(X, X2, dim)[source]
dK3_dlengthscaledXdX2(X, X2, dimX, dimX2)[source]
dK3_dvariancedXdX2(X, X2, dim, dimX2)[source]
dK_dX(X, X2, dimX)[source]
dK_dX2(X, X2, dimX2)[source]
dK_dr(r)[source]
dK_dvariance(X, X2)[source]
get_one_dimensional_kernel(dim)[source]

Specially intended for Grid regression.

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

psi0(Z, variational_posterior)[source]
\[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]
\[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

spectrum(omega)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients_diag(dL_dKdiag, X)[source]

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]

Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

GPy.kern.src.sde_brownian module

Classes in this module enhance Brownian motion covariance function with the Stochastic Differential Equation (SDE) functionality.

class sde_Brownian(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]

Bases: GPy.kern.src.brownian.Brownian

Class provide extra functionality to transfer this covariance function into SDE form.

Linear kernel:

\[k(x,y) = \sigma^2 min(x,y)\]
sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_linear module

Classes in this module enhance Linear covariance function with the Stochastic Differential Equation (SDE) functionality.

class sde_Linear(input_dim, X, variances=None, ARD=False, active_dims=None, name='linear')[source]

Bases: GPy.kern.src.linear.Linear

Class provide extra functionality to transfer this covariance function into SDE form.

Linear kernel:

\[k(x,y) = \sum_{i=1}^{input dim} \sigma^2_i x_iy_i\]

Modify the init method, because one extra parameter is required. X - points on the X axis.

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_matern module

Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Matern32(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]

Bases: GPy.kern.src.stationary.Matern32

Class provide extra functionality to transfer this covariance function into SDE forrm.

Matern 3/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_Matern52(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]

Bases: GPy.kern.src.stationary.Matern52

Class provide extra functionality to transfer this covariance function into SDE forrm.

Matern 5/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{5} r + \]

rac{5}{3}r^2) exp(- sqrt{5} r) ext{ where } r = sqrt{sum_{i=1}^{input dim} rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_standard_periodic module

Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_StdPeriodic(*args, **kwargs)[source]

Bases: GPy.kern.src.standard_periodic.StdPeriodic

Class provide extra functionality to transfer this covariance function into SDE form.

Standard Periodic kernel:

\[k(x,y) = heta_1 \exp \left[ - \]
rac{1}{2} {}sum_{i=1}^{input_dim}
left(

rac{sin( rac{pi}{lambda_i} (x_i - y_i) )}{l_i} ight)^2 ight] }

Init constructior.

Two optinal extra parameters are added in addition to the ones in StdPeriodic kernel.

Parameters:
  • approx_order (int) – approximation order for the RBF covariance. (Default 7)
  • balance (bool) – Whether to balance this kernel separately. (Defaulf False). Model has a separate parameter for balancing.
sde()[source]

Return the state space representation of the standard periodic covariance.

! Note: one must constrain lengthscale not to drop below 0.2. (independently of approximation order) After this Bessel functions of the first becomes NaN. Rescaling time variable might help.

! Note: one must keep period also not very low. Because then the gradients wrt wavelength become ustable. However this might depend on the data. For test example with 300 data points the low limit is 0.15.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

seriescoeff(m=6, lengthScale=1.0, magnSigma2=1.0, true_covariance=False)[source]

Calculate the coefficients q_j^2 for the covariance function approximation:

k( au) = sum_{j=0}^{+infty} q_j^2 cos(jomega_0 au)

Reference is:

[1] Arno Solin and Simo Särkkä (2014). Explicit link between periodic
covariance functions and state space models. In Proceedings of the Seventeenth International Conference on Artifcial Intelligence and Statistics (AISTATS 2014). JMLR: W&CP, volume 33.
Note! Only the infinite approximation (through Bessel function)
is currently implemented.
m: int
Degree of approximation. Default 6.
lengthScale: float
Length scale parameter in the kerenl
magnSigma2:float
Multiplier in front of the kernel.
coeffs: array(m+1)
Covariance series coefficients
coeffs_dl: array(m+1)
Derivatives of the coefficients with respect to lengthscale.
GPy.kern.src.sde_static module

Classes in this module enhance Static covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Bias(input_dim, variance=1.0, active_dims=None, name='bias')[source]

Bases: GPy.kern.src.static.Bias

Class provide extra functionality to transfer this covariance function into SDE forrm.

Bias kernel:

\[k(x,y) = lpha\]
sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_White(input_dim, variance=1.0, active_dims=None, name='white')[source]

Bases: GPy.kern.src.static.White

Class provide extra functionality to transfer this covariance function into SDE forrm.

White kernel:

\[k(x,y) = lpha*\delta(x-y)\]
sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_stationary module

Classes in this module enhance several stationary covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Exponential(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]

Bases: GPy.kern.src.stationary.Exponential

Class provide extra functionality to transfer this covariance function into SDE form.

Exponential kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_RBF(*args, **kwargs)[source]

Bases: GPy.kern.src.rbf.RBF

Class provide extra functionality to transfer this covariance function into SDE form.

Radial Basis Function kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

Init constructior.

Two optinal extra parameters are added in addition to the ones in RBF kernel.

Parameters:
  • approx_order (int) – approximation order for the RBF covariance. (Default 10)
  • balance (bool) – Whether to balance this kernel separately. (Defaulf True). Model has a separate parameter for balancing.
sde()[source]

Return the state space representation of the covariance.

Note! For Sparse GP inference too small or two high values of lengthscale lead to instabilities. This is because Qc are too high or too low and P_inf are not full rank. This effect depends on approximatio order. For N = 10. lengthscale must be in (0.8,8). For other N tests must be conducted. N=6: (0.06,31) Variance should be within reasonable bounds as well, but its dependence is linear.

The above facts do not take into accout regularization.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_RatQuad(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]

Bases: GPy.kern.src.stationary.RatQuad

Class provide extra functionality to transfer this covariance function into SDE form.

Rational Quadratic kernel:

\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- lpha} \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

GPy.kern.src.spline module
class Spline(input_dim, variance=1.0, c=1.0, active_dims=None, name='spline')[source]

Bases: GPy.kern.src.kern.Kern

Linear spline kernel. You need to specify 2 parameters: the variance and c. The variance is defined in powers of 10. Thus specifying -2 means 10^-2. The parameter c allows to define the stiffness of the spline fit. A very stiff spline equals linear regression. See https://www.youtube.com/watch?v=50Vgw11qn0o starting at minute 1:17:28 Lit: Wahba, 1990

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.splitKern module

A new kernel

class DEtime(kernel, idx_p, Xp, index_dim=-1, name='DiffGenomeKern')[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class SplitKern(kernel, Xp, index_dim=-1, name='SplitKern')[source]

Bases: GPy.kern.src.kern.CombinationKernel

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class SplitKern_cross(kernel, Xp, name='SplitKern_cross')[source]

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.standard_periodic module

The standard periodic kernel which mentioned in:

[1] Gaussian Processes for Machine Learning, C. E. Rasmussen, C. K. I. Williams. The MIT Press, 2005.

[2] Introduction to Gaussian processes. D. J. C. MacKay. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 133-165. Springer, 1998.

class StdPeriodic(input_dim, variance=1.0, period=None, lengthscale=None, ARD1=False, ARD2=False, active_dims=None, name='std_periodic', useGPU=False)[source]

Bases: GPy.kern.src.kern.Kern

Standart periodic kernel

\[k(x,y) = heta_1 \exp \left[ - \]
rac{1}{2} sum_{i=1}^{input_dim}
left(

rac{sin( rac{pi}{T_i} (x_i - y_i) )}{l_i} ight)^2 ight] }

param input_dim:
 the number of input dimensions
type input_dim:int
param variance:the variance :math:` heta_1` in the formula above
type variance:float
param period:the vector of periods \(\T_i\). If None then 1.0 is assumed.
type period:array or list of the appropriate size (or float if there is only one period parameter)
param lengthscale:
 the vector of lengthscale \(\l_i\). If None then 1.0 is assumed.
type lengthscale:
 array or list of the appropriate size (or float if there is only one lengthscale parameter)
param ARD1:Auto Relevance Determination with respect to period. If equal to “False” one single period parameter \(\T_i\) for each dimension is assumed, otherwise there is one lengthscale parameter per dimension.
type ARD1:Boolean
param ARD2:Auto Relevance Determination with respect to lengthscale. If equal to “False” one single lengthscale parameter \(l_i\) for each dimension is assumed, otherwise there is one lengthscale parameter per dimension.
type ARD2:Boolean
param active_dims:
 indices of dimensions which are used in the computation of the kernel
type active_dims:
 array or list of the appropriate size
param name:Name of the kernel for output

:type String :param useGPU: whether of not use GPU :type Boolean

K(X, X2=None)[source]

Compute the covariance matrix between X and X2.

Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

input_sensitivity(summarize=True)[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

parameters_changed()[source]

This functions deals as a callback for each optimization iteration. If one optimization step was successfull and the parameters this callback function will be called to be able to update any precomputations for the kernel.

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients_diag(dL_dKdiag, X)[source]

derivative of the diagonal of the covariance matrix with respect to the parameters.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.static module
class Bias(input_dim, variance=1.0, active_dims=None, name='bias')[source]

Bases: GPy.kern.src.static.Static

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class Fixed(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='fixed')[source]

Bases: GPy.kern.src.static.Static

Parameters:
  • input_dim (int) – the number of input dimensions
  • variance (float) – the variance of the kernel
K(X, X2)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class Precomputed(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='precomputed')[source]

Bases: GPy.kern.src.static.Fixed

Class for precomputed kernels, indexed by columns in X

Usage example:

import numpy as np from GPy.models import GPClassification from GPy.kern import Precomputed from sklearn.cross_validation import LeaveOneOut

n = 10 d = 100 X = np.arange(n).reshape((n,1)) # column vector of indices y = 2*np.random.binomial(1,0.5,(n,1))-1 X0 = np.random.randn(n,d) k = np.dot(X0,X0.T) kern = Precomputed(1,k) # k is a n x n covariance matrix

cv = LeaveOneOut(n) ypred = y.copy() for train, test in cv:

m = GPClassification(X[train], y[train], kernel=kern) m.optimize() ypred[test] = 2*(m.predict(X[test])[0]>0.5)-1
Parameters:
  • input_dim (int) – the number of input dimensions
  • variance (float) – the variance of the kernel
K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class Static(input_dim, variance, active_dims, name)[source]

Bases: GPy.kern.src.kern.Kern

Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_XX(dL_dK, X, X2=None)[source]
\[\]

frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}

gradients_XX_diag(dL_dKdiag, X, cov=False)[source]

The diagonal of the second derivative w.r.t. X and X2

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

psi0(Z, variational_posterior)[source]
\[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]
\[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

class White(input_dim, variance=1.0, active_dims=None, name='white')[source]

Bases: GPy.kern.src.static.Static

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class WhiteHeteroscedastic(input_dim, num_data, variance=1.0, active_dims=None, name='white_hetero')[source]

Bases: GPy.kern.src.static.Static

A heteroscedastic White kernel (nugget/noise). It defines one variance (nugget) per input sample.

Prediction excludes any noise learnt by this Kernel, so be careful using this kernel.

You can plot the errors learnt by this kernel by something similar as: plt.errorbar(m.X, m.Y, yerr=2*np.sqrt(m.kern.white.variance))

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
psi2(Z, variational_posterior)[source]
\[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]
\[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.stationary module
class Cosine(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Cosine')[source]

Bases: GPy.kern.src.stationary.Stationary

Cosine Covariance function

\[k(r) = \sigma^2 \cos(r)\]
K_of_r(r)[source]
dK_dr(r)[source]
class ExpQuad(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='ExpQuad')[source]

Bases: GPy.kern.src.stationary.Stationary

The Exponentiated quadratic covariance function.

\[k(r) = \sigma^2 \exp(- 0.5 r^2)\]
notes::
  • This is exactly the same as the RBF covariance function, but the RBF implementation also has some features for doing variational kernels (the psi-statistics).
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class ExpQuadCosine(input_dim, variance=1.0, lengthscale=None, period=1.0, ARD=False, active_dims=None, name='ExpQuadCosine')[source]

Bases: GPy.kern.src.stationary.Stationary

Exponentiated quadratic multiplied by cosine covariance function (spectral mixture kernel).

\[k(r) = \sigma^2 \exp(-2\pi^2r^2)\cos(2\pi r/T)\]
K_of_r(r)[source]
dK_dr(r)[source]
update_gradients_diag(dL_dKdiag, X)[source]

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_full(dL_dK, X, X2=None)[source]

Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

class Exponential(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]

Bases: GPy.kern.src.stationary.Stationary

K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class Matern32(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]

Bases: GPy.kern.src.stationary.Stationary

Matern 3/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{\text{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]
Gram_matrix(F, F1, F2, lower, upper)[source]

Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.

Parameters:
  • F (np.array) – vector of functions
  • F1 (np.array) – vector of derivatives of F
  • F2 (np.array) – vector of second derivatives of F
  • lower,upper (floats) – boundaries of the input domain
K_of_r(r)[source]
dK_dr(r)[source]
sde()[source]

Return the state space representation of the covariance.

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class Matern52(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]

Bases: GPy.kern.src.stationary.Stationary

Matern 5/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{5} r + \frac53 r^2) \exp(- \sqrt{5} r)\]
Gram_matrix(F, F1, F2, F3, lower, upper)[source]

Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.

Parameters:
  • F (np.array) – vector of functions
  • F1 (np.array) – vector of derivatives of F
  • F2 (np.array) – vector of second derivatives of F
  • F3 (np.array) – vector of third derivatives of F
  • lower,upper (floats) – boundaries of the input domain
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class OU(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='OU')[source]

Bases: GPy.kern.src.stationary.Stationary

OU kernel:

\[k(r) = \sigma^2 \exp(- r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{ ext{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class RatQuad(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]

Bases: GPy.kern.src.stationary.Stationary

Rational Quadratic Kernel

\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- \alpha}\]
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients_diag(dL_dKdiag, X)[source]

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_full(dL_dK, X, X2=None)[source]

Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

class Sinc(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Sinc')[source]

Bases: GPy.kern.src.stationary.Stationary

Sinc Covariance function

\[k(r) = \sigma^2 \sinc(\pi r)\]
K_of_r(r)[source]
dK_dr(r)[source]
class Stationary(input_dim, variance, lengthscale, ARD, active_dims, name, useGPU=False)[source]

Bases: GPy.kern.src.kern.Kern

Stationary kernels (covariance functions).

Stationary covariance fucntion depend only on r, where r is defined as

\[r(x, x') = \sqrt{ \sum_{q=1}^Q (x_q - x'_q)^2 }\]

The covariance function k(x, x’ can then be written k(r).

In this implementation, r is scaled by the lengthscales parameter(s):

\[r(x, x') = \sqrt{ \sum_{q=1}^Q \frac{(x_q - x'_q)^2}{\ell_q^2} }.\]

By default, there’s only one lengthscale: seaprate lengthscales for each dimension can be enables by setting ARD=True.

To implement a stationary covariance function using this class, one need only define the covariance function k(r), and it derivative.

``` def K_of_r(self, r):

return foo
def dK_dr(self, r):
return bar

```

The lengthscale(s) and variance parameters are added to the structure automatically.

Thanks to @strongh: In Stationary, a covariance function is defined in GPy as stationary when it depends only on the l2-norm |x_1 - x_2 |. However this is the typical definition of isotropy, while stationarity is usually a bit more relaxed. The more common version of stationarity is that the covariance is a function of x_1 - x_2 (See e.g. R&W first paragraph of section 4.1).

K(X, X2=None)[source]

Kernel function applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

K(X, X2) = K_of_r((X-X2)**2)

K_of_r(r)[source]
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
dK2_drdr(r)[source]
dK2_drdr_diag()[source]

Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.

dK2_drdr_via_X(X, X2)[source]
dK_dr(r)[source]
dK_dr_via_X(X, X2)[source]

compute the derivative of K wrt X going through X

dgradients2_dXdX2(X, X2, dimX, dimX2)[source]
dgradients_dX(X, X2, dimX)[source]
dgradients_dX2(X, X2, dimX2)[source]
get_one_dimensional_kernel(dimensions)[source]

Specially intended for the grid regression case For a given covariance kernel, this method returns the corresponding kernel for a single dimension. The resulting values can then be used in the algorithm for reconstructing the full covariance matrix.

gradients_X(dL_dK, X, X2=None)[source]

Given the derivative of the objective wrt K (dL_dK), compute the derivative wrt X

gradients_XX(dL_dK, X, X2=None)[source]

Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:

returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].

..math:

rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}

..returns:
dL2_dXdX2: [NxMxQxQ] in the cov=True case, or [NxMxQ] in the cov=False case,
for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None) Thus, we return the second derivative in X2.
gradients_XX_diag(dL_dK_diag, X)[source]

Given the derivative of the objective dL_dK, compute the second derivative of K wrt X:

..math:

rac{partial^2 K}{partial Xpartial X}

..returns:
dL2_dXdX: [NxQxQ]
gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

input_sensitivity(summarize=True)[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

reset_gradients()[source]
update_gradients_diag(dL_dKdiag, X)[source]

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_direct(dL_dVar, dL_dLen)[source]

Specially intended for the Grid regression case. Given the computed log likelihood derivates, update the corresponding kernel and likelihood gradients. Useful for when gradients have been computed a priori.

update_gradients_full(dL_dK, X, X2=None, reset=True)[source]

Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

GPy.kern.src.stationary_cython module
GPy.kern.src.symbolic module
GPy.kern.src.symmetric module
class Symmetric(base_kernel, transform, symmetry_type='even')[source]

Bases: GPy.kern.src.kern.Kern

Symmetric kernel that models a function with even or odd symmetry:

For even symmetry we have:

\[f(x) = f(Ax)\]

we then model the function as:

\[f(x) = g(x) + g(Ax)\]

the corresponding kernel is:

\[k(x, x') + k(Ax, x') + k(x, Ax') + k(Ax, Ax')\]

For odd symmetry we have:

\[f(x) = -f(Ax)\]

it does this by modelling:

\[f(x) = g(x) - g(Ax)\]

with kernel

\[k(x, x') - k(Ax, x') - k(x, Ax') + k(Ax, Ax')\]

where k(x, x’) is the kernel of g(x)

Parameters:
  • base_kernel – kernel to make symmetric
  • transform – transformation matrix describing symmetry plane, A in equations above
  • symmetry_type – ‘odd’ or ‘even’ depending on the symmetry needed
K(X, X2)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_diag(dL_dK, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.trunclinear module
class TruncLinear(input_dim, variances=None, delta=None, ARD=False, active_dims=None, name='linear')[source]

Bases: GPy.kern.src.kern.Kern

Truncated Linear kernel

\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]
Parameters:
  • input_dim (int) – the number of input dimensions
  • variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
  • ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type:

kernel object

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

input_sensitivity()[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

class TruncLinear_inf(input_dim, interval, variances=None, ARD=False, active_dims=None, name='linear')[source]

Bases: GPy.kern.src.kern.Kern

Truncated Linear kernel

\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]
Parameters:
  • input_dim (int) – the number of input dimensions
  • variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
  • ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type:

kernel object

K(X, X2=None)[source]

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]
Parameters:
  • X – the first set of inputs to the kernel
  • X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]
gradients_X(dL_dK, X, X2=None)[source]
\[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

input_sensitivity()[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]

Set the gradients of all parameters when doing full (N) inference.

GPy.likelihoods package

Introduction

The likelihood is \(p(y|f,X)\) which is how well we will predict target values given inputs \(X\) and our latent function \(f\) (\(y\) without noise). Marginal likelihood \(p(y|X)\), is the same as likelihood except we marginalize out the model \(f\). The importance of likelihoods in Gaussian Processes is in determining the ‘best’ values of kernel and noise hyperparamters to relate known, observed and unobserved data. The purpose of optimizing a model (e.g. GPy.models.GPRegression) is to determine the ‘best’ hyperparameters i.e. those that minimize negative log marginal likelihood.

Inheritance diagram of GPy.likelihoods.likelihood, GPy.likelihoods.mixed_noise.MixedNoise

Most likelihood classes inherit directly from GPy.likelihoods.likelihood, although an intermediary class GPy.likelihoods.mixed_noise.MixedNoise is used by GPy.likelihoods.multioutput_likelihood.

Submodules

GPy.likelihoods.bernoulli module

class Bernoulli(gp_link=None)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Bernoulli likelihood

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]

Note

Y takes values in either {-1, 1} or {0, 1}. link function should have the domain [0, 1], e.g. probit (default) or Heaviside

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]

Hessian at y, given inv_link_f, w.r.t inv_link_f the hessian will be 0 unless i == j i.e. second derivative logpdf at y given inverse link of f_i and inverse link of f_j w.r.t inverse link of f_i and inverse link of f_j.

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{-y_{i}}{\lambda(f)^{2}} - \frac{(1-y_{i})}{(1-\lambda(f))^{2}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in bernoulli
Returns:

Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given inverse link of f w.r.t inverse link of f

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2y_{i}}{\lambda(f)^{3}} - \frac{2(1-y_{i}}{(1-\lambda(f))^{3}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables passed through inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in bernoulli
Returns:

third derivative of log likelihood evaluated at points inverse_link(f)

Return type:

Nx1 array

Gradient of the pdf at y, given inverse link of f w.r.t inverse link of f.

\[\frac{d\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{y_{i}}{\lambda(f_{i})} - \frac{(1 - y_{i})}{(1 - \lambda(f_{i}))}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in bernoulli
Returns:

gradient of log likelihood evaluated at points inverse link of f.

Return type:

Nx1 array

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]

Log Likelihood function given inverse link of f.

\[\ln p(y_{i}|\lambda(f_{i})) = y_{i}\log\lambda(f_{i}) + (1-y_{i})\log (1-f_{i})\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in bernoulli
Returns:

log likelihood evaluated at points inverse link of f.

Return type:

float

moments_match_ep(Y_i, tau_i, v_i, Y_metadata_i=None)[source]

Moments match of the marginal approximation in EP algorithm

Parameters:
  • i – number of observation (int)
  • tau_i – precision of the cavity distribution (float)
  • v_i – mean/variance of the cavity distribution (float)

Likelihood function given inverse link of f.

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in bernoulli
Returns:

likelihood evaluated for this point

Return type:

float

predictive_mean(mu, variance, Y_metadata=None)[source]

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]

Get the “quantiles” of the binary labels (Bernoulli draws). all the quantiles must be either 0 or 1, since those are the only values the draw can take!

predictive_variance(mu, variance, pred_mean, Y_metadata=None)[source]

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
Predictive_mean:
 

output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

GPy.likelihoods.binomial module

class Binomial(gp_link=None)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Binomial likelihood

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]

Note

Y takes values in either {-1, 1} or {0, 1}. link function should have the domain [0, 1], e.g. probit (default) or Heaviside

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]

Hessian at y, given inv_link_f, w.r.t inv_link_f the hessian will be 0 unless i == j i.e. second derivative logpdf at y given inverse link of f_i and inverse link of f_j w.r.t inverse link of f_i and inverse link of f_j.

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{-y_{i}}{\lambda(f)^{2}} - \frac{(N-y_{i})}{(1-\lambda(f))^{2}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in binomial
Returns:

Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given inverse link of f w.r.t inverse link of f

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{2y_{i}}{\lambda(f)^{3}} - \frac{2(N-y_{i})}{(1-\lambda(f))^{3}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in binomial
Returns:

Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)

Gradient of the pdf at y, given inverse link of f w.r.t inverse link of f.

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{y_{i}}{\lambda(f)} - \frac{(N-y_{i})}{(1-\lambda(f))}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata must contain ‘trials’
Returns:

gradient of log likelihood evaluated at points inverse link of f.

Return type:

Nx1 array

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]

Log Likelihood function given inverse link of f.

\[\ln p(y_{i}|\lambda(f_{i})) = y_{i}\log\lambda(f_{i}) + (1-y_{i})\log (1-f_{i})\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata must contain ‘trials’
Returns:

log likelihood evaluated at points inverse link of f.

Return type:

float

moments_match_ep(obs, tau, v, Y_metadata_i=None)[source]

Calculation of moments using quadrature :param obs: observed output :param tau: cavity distribution 1st natural parameter (precision) :param v: cavity distribution 2nd natural paramenter (mu*precision)

Likelihood function given inverse link of f.

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inverse link of f.
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata must contain ‘trials’
Returns:

likelihood evaluated for this point

Return type:

float

samples(gp, Y_metadata=None, **kw)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable
variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

GPy.likelihoods.exponential module

class Exponential(gp_link=None)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Expoential likelihood Y is expected to take values in {0,1,2,…} —– $$ L(x) = exp(lambda) * lambda**Y_i / Y_i! $$

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\frac{1}{\lambda(f_{i})^{2}}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in exponential distribution
Returns:

Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2}{\lambda(f_{i})^{3}}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in exponential distribution
Returns:

third derivative of likelihood evaluated at points f

Return type:

Nx1 array

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{1}{\lambda(f)} - y_{i}\]
Parameters:
  • link_f (Nx1 array) – latent variables (f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in exponential distribution
Returns:

gradient of likelihood evaluated at points

Return type:

Nx1 array

Log Likelihood Function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = \ln \lambda(f_{i}) - y_{i}\lambda(f_{i})\]
Parameters:
  • link_f (Nx1 array) – latent variables (link(f))
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in exponential distribution
Returns:

likelihood evaluated for this point

Return type:

float

Likelihood function given link(f)

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})\exp (-y\lambda(f_{i}))\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in exponential distribution
Returns:

likelihood evaluated for this point

Return type:

float

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable

GPy.likelihoods.gamma module

class Gamma(gp_link=None, beta=1.0)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Gamma likelihood

\[\begin{split}p(y_{i}|\lambda(f_{i})) = \frac{\beta^{\alpha_{i}}}{\Gamma(\alpha_{i})}y_{i}^{\alpha_{i}-1}e^{-\beta y_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]
d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\begin{split}\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\beta^{2}\frac{d\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in gamma distribution
Returns:

Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\begin{split}\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = -\beta^{3}\frac{d^{2}\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in gamma distribution
Returns:

third derivative of likelihood evaluated at points f

Return type:

Nx1 array

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\begin{split}\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \beta (\log \beta y_{i}) - \Psi(\alpha_{i})\beta\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables (f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in gamma distribution
Returns:

gradient of likelihood evaluated at points

Return type:

Nx1 array

Log Likelihood Function given link(f)

\[\begin{split}\ln p(y_{i}|\lambda(f_{i})) = \alpha_{i}\log \beta - \log \Gamma(\alpha_{i}) + (\alpha_{i} - 1)\log y_{i} - \beta y_{i}\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables (link(f))
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

likelihood evaluated for this point

Return type:

float

Likelihood function given link(f)

\[\begin{split}p(y_{i}|\lambda(f_{i})) = \frac{\beta^{\alpha_{i}}}{\Gamma(\alpha_{i})}y_{i}^{\alpha_{i}-1}e^{-\beta y_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

likelihood evaluated for this point

Return type:

float

GPy.likelihoods.gaussian module

A lot of this code assumes that the link function is the identity.

I think laplace code is okay, but I’m quite sure that the EP moments will only work if the link is identity.

Furthermore, exact Guassian inference can only be done for the identity link, so we should be asserting so for all calls which relate to that.

James 11/12/13

class Gaussian(gp_link=None, variance=1.0, name='Gaussian_noise')[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Gaussian likelihood

\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]
Parameters:
  • variance – variance value of the Gaussian distribution
  • N (int) – Number of data points
betaY(Y, Y_metadata=None)[source]
d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]

Hessian at y, given link_f, w.r.t link_f. i.e. second derivative logpdf at y given link(f_i) link(f_j) w.r.t link(f_i) and link(f_j)

The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}f} = -\frac{1}{\sigma^{2}}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points link(f))

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]
d2logpdf_dlink2_dvar(link_f, y, Y_metadata=None)[source]

Gradient of the hessian (d2logpdf_dlink2) w.r.t variance parameter (noise_variance)

\[\frac{d}{d\sigma^{2}}(\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)}) = \frac{1}{\sigma^{4}}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log hessian evaluated at points link(f_i) and link(f_j) w.r.t variance parameter

Return type:

Nx1 array

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = 0\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

third derivative of log likelihood evaluated at points link(f)

Return type:

Nx1 array

Gradient of the pdf at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{1}{\sigma^{2}}(y_{i} - \lambda(f_{i}))\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

gradient of log likelihood evaluated at points link(f)

Return type:

Nx1 array

Derivative of the dlogpdf_dlink w.r.t variance parameter (noise_variance)

\[\frac{d}{d\sigma^{2}}(\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)}) = \frac{1}{\sigma^{4}}(-y_{i} + \lambda(f_{i}))\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log likelihood evaluated at points link(f) w.r.t variance parameter

Return type:

Nx1 array

Gradient of the log-likelihood function at y given link(f), w.r.t variance parameter (noise_variance)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\sigma^{2}} = -\frac{N}{2\sigma^{2}} + \frac{(y_{i} - \lambda(f_{i}))^{2}}{2\sigma^{4}}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log likelihood evaluated at points link(f) w.r.t variance parameter

Return type:

float

ep_gradients(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]
exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]
gaussian_variance(Y_metadata=None)[source]
log_predictive_density(y_test, mu_star, var_star, Y_metadata=None)[source]

assumes independence

Log likelihood function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

log likelihood evaluated for this point

Return type:

float

moments_match_ep(data_i, tau_i, v_i, Y_metadata_i=None)[source]

Moments match of the marginal approximation in EP algorithm

Parameters:
  • i – number of observation (int)
  • tau_i – precision of the cavity distribution (float)
  • v_i – mean/variance of the cavity distribution (float)

Likelihood function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

likelihood evaluated for this point

Return type:

float

predictive_mean(mu, sigma)[source]

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]
predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]

Compute mean, variance of the predictive distibution.

Parameters:
  • mu – mean of the latent variable, f, of posterior
  • var – variance of the latent variable, f, of posterior
  • full_cov (Boolean) – whether to use the full covariance or just the diagonal
predictive_variance(mu, sigma, predictive_mean=None)[source]

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
Predictive_mean:
 

output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients(grad)[source]
variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

class HeteroscedasticGaussian(Y_metadata, gp_link=None, variance=1.0, name='het_Gauss')[source]

Bases: GPy.likelihoods.gaussian.Gaussian

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]
gaussian_variance(Y_metadata=None)[source]
predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]
predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]

Compute mean, variance of the predictive distibution.

Parameters:
  • mu – mean of the latent variable, f, of posterior
  • var – variance of the latent variable, f, of posterior
  • full_cov (Boolean) – whether to use the full covariance or just the diagonal

GPy.likelihoods.likelihood module

class Likelihood(gp_link, name)[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

Likelihood base class, used to defing p(y|f).

All instances use _inverse_ link functions, which can be swapped out. It is expected that inheriting classes define a default inverse link function

To use this class, inherit and define missing functionality.

Inheriting classes must implement:
pdf_link : a bound method which turns the output of the link function into the pdf logpdf_link : the logarithm of the above
To enable use with EP, inheriting classes must define:
TODO: a suitable derivative function for any parameters of the class
It is also desirable to define:
moments_match_ep : a function to compute the EP moments If this isn’t defined, the moments will be computed using 1D quadrature.
To enable use with Laplace approximation, inheriting classes must define:
Some derivative functions AS TODO

For exact Gaussian inference, define JH TODO

MCMC_pdf_samples(fNew, num_samples=1000, starting_loc=None, stepsize=0.1, burn_in=1000, Y_metadata=None)[source]

Simple implementation of Metropolis sampling algorithm

Will run a parallel chain for each input dimension (treats each f independently) Thus assumes f*_1 independant of f*_2 etc.

Parameters:
  • num_samples – Number of samples to take
  • fNew – f at which to sample around
  • starting_loc – Starting locations of the independant chains (usually will be conditional_mean of likelihood), often link_f
  • stepsize – Stepsize for the normal proposal distribution (will need modifying)
  • burnin – number of samples to use for burnin (will need modifying)
  • Y_metadata – Y_metadata for pdf
conditional_mean(gp)[source]

The mean of the random variable conditioned on one value of the GP

conditional_variance(gp)[source]

The variance of the random variable conditioned on one value of the GP

d2logpdf_df2(*args, **kwargs)
d2logpdf_df2_dtheta(f, y, Y_metadata=None)[source]

TODO: Doc strings

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]
d2logpdf_dlink2_dtheta(inv_link_f, y, Y_metadata=None)[source]
d3logpdf_df3(*args, **kwargs)
d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]
dlogpdf_df(f, y, Y_metadata=None)[source]

Evaluates the link function link(f) then computes the derivative of log likelihood using it Uses the Faa di Bruno’s formula for the chain rule

\[\frac{d\log p(y|\lambda(f))}{df} = \frac{d\log p(y|\lambda(f))}{d\lambda(f)}\frac{d\lambda(f)}{df}\]
Parameters:
  • f (Nx1 array) – latent variables f
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:

derivative of log likelihood evaluated for this point

Return type:

1xN array

dlogpdf_df_dtheta(f, y, Y_metadata=None)[source]

TODO: Doc strings

dlogpdf_dtheta(f, y, Y_metadata=None)[source]

TODO: Doc strings

ep_gradients(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]
exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]
static from_dict(input_dict)[source]

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:input_dict (dict) – Dictionary with all the information needed to instantiate the object.
integrate_gh(Y, mu, sigma, Y_metadata_i=None, gh_points=None)[source]
integrate_gk(Y, mu, sigma, Y_metadata_i=None)[source]
log_predictive_density(y_test, mu_star, var_star, Y_metadata=None)[source]

Calculation of the log predictive density

Parameters:
  • y_test ((Nx1) array) – test observations (y_{*})
  • mu_star ((Nx1) array) – predictive mean of gaussian p(f_{*}|mu_{*}, var_{*})
  • var_star ((Nx1) array) – predictive variance of gaussian p(f_{*}|mu_{*}, var_{*})
log_predictive_density_sampling(y_test, mu_star, var_star, Y_metadata=None, num_samples=1000)[source]

Calculation of the log predictive density via sampling

Parameters:
  • y_test ((Nx1) array) – test observations (y_{*})
  • mu_star ((Nx1) array) – predictive mean of gaussian p(f_{*}|mu_{*}, var_{*})
  • var_star ((Nx1) array) – predictive variance of gaussian p(f_{*}|mu_{*}, var_{*})
  • num_samples (int) – num samples of p(f_{*}|mu_{*}, var_{*}) to take
logpdf(f, y, Y_metadata=None)[source]

Evaluates the link function link(f) then computes the log likelihood (log pdf) using it

Parameters:
  • f (Nx1 array) – latent variables f
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:

log likelihood evaluated for this point

Return type:

float

logpdf_sum(f, y, Y_metadata=None)[source]

Convenience function that can overridden for functions where this could be computed more efficiently

moments_match_ep(obs, tau, v, Y_metadata_i=None)[source]

Calculation of moments using quadrature

Parameters:
  • obs – observed output
  • tau – cavity distribution 1st natural parameter (precision)
  • v – cavity distribution 2nd natural paramenter (mu*precision)
pdf(f, y, Y_metadata=None)[source]

Evaluates the link function link(f) then computes the likelihood (pdf) using it

Parameters:
  • f (Nx1 array) – latent variables f
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:

likelihood evaluated for this point

Return type:

float

predictive_mean(mu, variance, Y_metadata=None)[source]

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]
predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]

Compute mean, variance of the predictive distibution.

Parameters:
  • mu – mean of the latent variable, f, of posterior
  • var – variance of the latent variable, f, of posterior
  • full_cov (Boolean) – whether to use the full covariance or just the diagonal
predictive_variance(mu, variance, predictive_mean=None, Y_metadata=None)[source]

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
Predictive_mean:
 

output’s predictive mean, if None _predictive_mean function will be called.

request_num_latent_functions(Y)[source]

The likelihood should infer how many latent functions are needed for the likelihood

Default is the number of outputs

samples(gp, Y_metadata=None, samples=1)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:
  • gp – latent variable
  • samples – number of samples to take for each f location
to_dict()[source]
update_gradients(partial)[source]
variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

GPy.likelihoods.loggaussian module

class LogGaussian(gp_link=None, sigma=1.0)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

\[$$ p(y_{i}|f_{i}, z_{i}) = \prod_{i=1}^{n} (\frac{ry^{r-1}}{\exp{f(x_{i})}})^{1-z_i} (1 + (\frac{y}{\exp(f(x_{i}))})^{r})^{z_i-2} $$\]
d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log likelihood evaluated at points link(f) w.r.t variance parameter

Return type:

Nx1 array

d2logpdf_dlink2_dvar(link_f, y, Y_metadata=None)[source]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log likelihood evaluated at points link(f) w.r.t variance parameter

Return type:

Nx1 array

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]

Gradient of the log-likelihood function at y given f, w.r.t shape parameter

\[\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

derivative of likelihood evaluated at points f w.r.t variance parameter

Return type:

float

derivative of logpdf wrt link_f param .. math:

:param link_f: latent variables link(f)
:type link_f: Nx1 array
:param y: data
:type y: Nx1 array
:param Y_metadata: includes censoring information in dictionary key 'censored'
:returns: likelihood evaluated for this point
:rtype: float
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log likelihood evaluated at points link(f) w.r.t variance parameter

Return type:

Nx1 array

Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log likelihood evaluated at points link(f) w.r.t variance parameter

Return type:

Nx1 array

Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata not used in gaussian
Returns:

derivative of log likelihood evaluated at points link(f) w.r.t variance parameter

Return type:

Nx1 array

Gradient of the log-likelihood function at y given f, w.r.t variance parameter

\[\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

derivative of likelihood evaluated at points f w.r.t variance parameter

Return type:

float

Parameters:
  • link_f (Nx1 array) – latent variables (link(f))
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

likelihood evaluated for this point

Return type:

float

Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

likelihood evaluated for this point

Return type:

float

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable
update_gradients(grads)[source]

Pull out the gradients, be careful as the order must match the order in which the parameters are added

GPy.likelihoods.loglogistic module

class LogLogistic(gp_link=None, r=1.0)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

\[$$ p(y_{i}|f_{i}, z_{i}) = \prod_{i=1}^{n} (\frac{ry^{r-1}}{\exp{f(x_{i})}})^{1-z_i} (1 + (\frac{y}{\exp(f(x_{i}))})^{r})^{z_i-2} $$\]
d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dr(inv_link_f, y, Y_metadata=None)[source]

Gradient of the hessian (d2logpdf_dlink2) w.r.t shape parameter

\[\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

derivative of hessian evaluated at points f and f_j w.r.t variance parameter

Return type:

Nx1 array

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]
d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

third derivative of likelihood evaluated at points f

Return type:

Nx1 array

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\]
Parameters:
  • link_f (Nx1 array) – latent variables (f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

gradient of likelihood evaluated at points

Return type:

Nx1 array

Derivative of the dlogpdf_dlink w.r.t shape parameter

\[\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inv_link_f
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

derivative of likelihood evaluated at points f w.r.t variance parameter

Return type:

Nx1 array

Gradient of the log-likelihood function at y given f, w.r.t shape parameter

\[\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

derivative of likelihood evaluated at points f w.r.t variance parameter

Return type:

float

Log Likelihood Function given link(f)

\[\]
Parameters:
  • link_f (Nx1 array) – latent variables (link(f))
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

likelihood evaluated for this point

Return type:

float

Likelihood function given link(f)

\[\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

likelihood evaluated for this point

Return type:

float

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable
update_gradients(grads)[source]

Pull out the gradients, be careful as the order must match the order in which the parameters are added

GPy.likelihoods.mixed_noise module

class MixedNoise(likelihoods_list, name='mixed_noise')[source]

Bases: GPy.likelihoods.likelihood.Likelihood

betaY(Y, Y_metadata)[source]
exact_inference_gradients(dL_dKdiag, Y_metadata)[source]
gaussian_variance(Y_metadata)[source]
predictive_quantiles(mu, var, quantiles, Y_metadata)[source]
predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]

Compute mean, variance of the predictive distibution.

Parameters:
  • mu – mean of the latent variable, f, of posterior
  • var – variance of the latent variable, f, of posterior
  • full_cov (Boolean) – whether to use the full covariance or just the diagonal
predictive_variance(mu, sigma, Y_metadata)[source]

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
Predictive_mean:
 

output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients(gradients)[source]

GPy.likelihoods.multioutput_likelihood module

class MultioutputLikelihood(likelihoods_list, name='multioutput_likelihood')[source]

Bases: GPy.likelihoods.mixed_noise.MixedNoise

CombinedLikelihood is used to combine different likelihoods for multioutput models, where different outputs have different observation models.

As input the likelihood takes a list of likelihoods used. The likelihood uses “output_index” in Y_metadata to connect observations to likelihoods.

d2logpdf_df2(f, y, Y_metadata)[source]
d2logpdf_df2_dtheta(f, y, Y_metadata=None)[source]

TODO: Doc strings

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]
d3logpdf_df3(f, y, Y_metadata=None)[source]
d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]
dlogpdf_df(f, y, Y_metadata)[source]

Evaluates the link function link(f) then computes the derivative of log likelihood using it Uses the Faa di Bruno’s formula for the chain rule

\[\frac{d\log p(y|\lambda(f))}{df} = \frac{d\log p(y|\lambda(f))}{d\lambda(f)}\frac{d\lambda(f)}{df}\]
Parameters:
  • f (Nx1 array) – latent variables f
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:

derivative of log likelihood evaluated for this point

Return type:

1xN array

dlogpdf_df_dtheta(f, y, Y_metadata=None)[source]

TODO: Doc strings

dlogpdf_dtheta(f, y, Y_metadata=None)[source]

TODO: Doc strings

ep_gradients(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]
exact_inference_gradients(dL_dKdiag, Y_metadata)[source]
log_predictive_density(y_test, mu_star, var_star, Y_metadata=None)[source]

Calculation of the log predictive density

Parameters:
  • y_test ((Nx1) array) – test observations (y_{*})
  • mu_star ((Nx1) array) – predictive mean of gaussian p(f_{*}|mu_{*}, var_{*})
  • var_star ((Nx1) array) – predictive variance of gaussian p(f_{*}|mu_{*}, var_{*})
logpdf(f, y, Y_metadata=None)[source]

Evaluates the link function link(f) then computes the log likelihood (log pdf) using it

Parameters:
  • f (Nx1 array) – latent variables f
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:

log likelihood evaluated for this point

Return type:

float

moments_match_ep(data_i, tau_i, v_i, Y_metadata_i)[source]

Calculation of moments using quadrature

Parameters:
  • obs – observed output
  • tau – cavity distribution 1st natural parameter (precision)
  • v – cavity distribution 2nd natural paramenter (mu*precision)
pdf(f, y, Y_metadata=None)[source]

Evaluates the link function link(f) then computes the likelihood (pdf) using it

Parameters:
  • f (Nx1 array) – latent variables f
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:

likelihood evaluated for this point

Return type:

float

predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]

Compute mean, variance of the predictive distibution.

Parameters:
  • mu – mean of the latent variable, f, of posterior
  • var – variance of the latent variable, f, of posterior
  • full_cov (Boolean) – whether to use the full covariance or just the diagonal
predictive_variance(mu, sigma, Y_metadata)[source]

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
Predictive_mean:
 

output’s predictive mean, if None _predictive_mean function will be called.

GPy.likelihoods.poisson module

class Poisson(gp_link=None)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Poisson likelihood

\[p(y_{i}|\lambda(f_{i})) = \frac{\lambda(f_{i})^{y_{i}}}{y_{i}!}e^{-\lambda(f_{i})}\]

Note

Y is expected to take values in {0,1,2,…}

conditional_mean(gp)[source]

The mean of the random variable conditioned on one value of the GP

conditional_variance(gp)[source]

The variance of the random variable conditioned on one value of the GP

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = \frac{-y_{i}}{\lambda(f_{i})^{2}}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2y_{i}}{\lambda(f_{i})^{3}}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

third derivative of likelihood evaluated at points f

Return type:

Nx1 array

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{y_{i}}{\lambda(f_{i})} - 1\]
Parameters:
  • link_f (Nx1 array) – latent variables (f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

gradient of likelihood evaluated at points

Return type:

Nx1 array

Log Likelihood Function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = -\lambda(f_{i}) + y_{i}\log \lambda(f_{i}) - \log y_{i}!\]
Parameters:
  • link_f (Nx1 array) – latent variables (link(f))
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

likelihood evaluated for this point

Return type:

float

Likelihood function given link(f)

\[p(y_{i}|\lambda(f_{i})) = \frac{\lambda(f_{i})^{y_{i}}}{y_{i}!}e^{-\lambda(f_{i})}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

likelihood evaluated for this point

Return type:

float

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable

GPy.likelihoods.student_t module

class StudentT(gp_link=None, deg_free=5, sigma2=2)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Student T likelihood

For nomanclature see Bayesian Data Analysis 2003 p576

\[p(y_{i}|\lambda(f_{i})) = \frac{\Gamma\left(\frac{v+1}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{v\pi\sigma^{2}}}\left(1 + \frac{1}{v}\left(\frac{(y_{i} - f_{i})^{2}}{\sigma^{2}}\right)\right)^{\frac{-v+1}{2}}\]
conditional_mean(gp)[source]

The mean of the random variable conditioned on one value of the GP

conditional_variance(gp)[source]

The variance of the random variable conditioned on one value of the GP

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = \frac{(v+1)((y_{i}-\lambda(f_{i}))^{2} - \sigma^{2}v)}{((y_{i}-\lambda(f_{i}))^{2} + \sigma^{2}v)^{2}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inv_link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]
d2logpdf_dlink2_dv(inv_link_f, y, Y_metadata=None)[source]
d2logpdf_dlink2_dvar(inv_link_f, y, Y_metadata=None)[source]

Gradient of the hessian (d2logpdf_dlink2) w.r.t variance parameter (t_noise)

\[\frac{d}{d\sigma^{2}}(\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}f}) = \frac{v(v+1)(\sigma^{2}v - 3(y_{i} - \lambda(f_{i}))^{2})}{(\sigma^{2}v + (y_{i} - \lambda(f_{i}))^{2})^{3}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

derivative of hessian evaluated at points f and f_j w.r.t variance parameter

Return type:

Nx1 array

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{-2(v+1)((y_{i} - \lambda(f_{i}))^3 - 3(y_{i} - \lambda(f_{i})) \sigma^{2} v))}{((y_{i} - \lambda(f_{i})) + \sigma^{2} v)^3}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

third derivative of likelihood evaluated at points f

Return type:

Nx1 array

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{(v+1)(y_{i}-\lambda(f_{i}))}{(y_{i}-\lambda(f_{i}))^{2} + \sigma^{2}v}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables (f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

gradient of likelihood evaluated at points

Return type:

Nx1 array

Derivative of the dlogpdf_dlink w.r.t variance parameter (t_noise)

\[\frac{d}{d\sigma^{2}}(\frac{d \ln p(y_{i}|\lambda(f_{i}))}{df}) = \frac{-2\sigma v(v + 1)(y_{i}-\lambda(f_{i}))}{(y_{i}-\lambda(f_{i}))^2 + \sigma^2 v)^2}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables inv_link_f
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

derivative of likelihood evaluated at points f w.r.t variance parameter

Return type:

Nx1 array

Gradient of the log-likelihood function at y given f, w.r.t variance parameter (t_noise)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\sigma^{2}} = \frac{v((y_{i} - \lambda(f_{i}))^{2} - \sigma^{2})}{2\sigma^{2}(\sigma^{2}v + (y_{i} - \lambda(f_{i}))^{2})}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

derivative of likelihood evaluated at points f w.r.t variance parameter

Return type:

float

Log Likelihood Function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = \ln \Gamma\left(\frac{v+1}{2}\right) - \ln \Gamma\left(\frac{v}{2}\right) - \ln \sqrt{v \pi\sigma^{2}} - \frac{v+1}{2}\ln \left(1 + \frac{1}{v}\left(\frac{(y_{i} - \lambda(f_{i}))^{2}}{\sigma^{2}}\right)\right)\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables (link(f))
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

likelihood evaluated for this point

Return type:

float

Likelihood function given link(f)

\[p(y_{i}|\lambda(f_{i})) = \frac{\Gamma\left(\frac{v+1}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{v\pi\sigma^{2}}}\left(1 + \frac{1}{v}\left(\frac{(y_{i} - \lambda(f_{i}))^{2}}{\sigma^{2}}\right)\right)^{\frac{-v+1}{2}}\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in student t distribution
Returns:

likelihood evaluated for this point

Return type:

float

predictive_mean(mu, sigma, Y_metadata=None)[source]

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
predictive_variance(mu, variance, predictive_mean=None, Y_metadata=None)[source]

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Parameters:
  • mu – mean of posterior
  • sigma – standard deviation of posterior
Predictive_mean:
 

output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:gp – latent variable
update_gradients(grads)[source]

Pull out the gradients, be careful as the order must match the order in which the parameters are added

GPy.likelihoods.weibull module

class Weibull(gp_link=None, beta=1.0)[source]

Bases: GPy.likelihoods.likelihood.Likelihood

Implementing Weibull likelihood function …

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\begin{split}\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\beta^{2}\frac{d\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in gamma distribution
Returns:

Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)

Return type:

Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dr(link_f, y, Y_metadata=None)[source]

Derivative of hessian of loglikelihood wrt r-shape parameter. :param link_f: :param y: :param Y_metadata: :return:

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]
Parameters:
  • f
  • y
  • Y_metadata
Returns:

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\begin{split}\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = -\beta^{3}\frac{d^{2}\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in gamma distribution
Returns:

third derivative of likelihood evaluated at points f

Return type:

Nx1 array

d3logpdf_dlink3_dr(link_f, y, Y_metadata=None)[source]
Parameters:
  • link_f
  • y
  • Y_metadata
Returns:

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\begin{split}\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \beta (\log \beta y_{i}) - \Psi(\alpha_{i})\beta\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables (f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in gamma distribution
Returns:

gradient of likelihood evaluated at points

Return type:

Nx1 array

First order derivative derivative of loglikelihood wrt r:shape parameter

Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in gamma distribution
Returns:

third derivative of likelihood evaluated at points f

Return type:

Nx1 array

Parameters:
  • f
  • y
  • Y_metadata
Returns:

Gradient of the log-likelihood function at y given f, w.r.t shape parameter

\[\]
Parameters:
  • inv_link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:

derivative of likelihood evaluated at points f w.r.t variance parameter

Return type:

float

Parameters:
  • f
  • y
  • Y_metadata
Returns:

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]

Log Likelihood Function given link(f)

\[\begin{split}\ln p(y_{i}|\lambda(f_{i})) = \alpha_{i}\log \beta - \log \Gamma(\alpha_{i}) + (\alpha_{i} - 1)\log y_{i} - \beta y_{i}\\ \alpha_{i} = \beta y_{i}\end{split}\]
Parameters:
  • link_f (Nx1 array) – latent variables (link(f))
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in poisson distribution
Returns:

likelihood evaluated for this point

Return type:

float

Likelihood function given link(f)

Parameters:
  • link_f (Nx1 array) – latent variables link(f)
  • y (Nx1 array) – data
  • Y_metadata – Y_metadata which is not used in weibull distribution
Returns:

likelihood evaluated for this point

Return type:

float

samples(gp, Y_metadata=None)[source]

Returns a set of samples of observations conditioned on a given value of latent variable f.

Parameters:gp – latent variable
update_gradients(grads)[source]

Pull out the gradients, be careful as the order must match the order in which the parameters are added

GPy.mappings package

Submodules

GPy.mappings.additive module

class Additive(mapping1, mapping2)[source]

Bases: GPy.core.mapping.Mapping

Mapping based on adding two existing mappings together.

\[f(\mathbf{x}*) = f_1(\mathbf{x}*) + f_2(\mathbf(x)*)\]
Parameters:
  • mapping1 (GPy.mappings.Mapping) – first mapping to add together.
  • mapping2 (GPy.mappings.Mapping) – second mapping to add together.
f(X)[source]
gradients_X(dL_dF, X)[source]
update_gradients(dL_dF, X)[source]

GPy.mappings.compound module

class Compound(mapping1, mapping2)[source]

Bases: GPy.core.mapping.Mapping

Mapping based on passing one mapping through another

\[f(\mathbf{x}) = f_2(f_1(\mathbf{x}))\]
Parameters:
  • mapping1 (GPy.mappings.Mapping) – first mapping
  • mapping2 (GPy.mappings.Mapping) – second mapping
f(X)[source]
gradients_X(dL_dF, X)[source]
update_gradients(dL_dF, X)[source]

GPy.mappings.constant module

class Constant(input_dim, output_dim, value=0.0, name='constmap')[source]

Bases: GPy.core.mapping.Mapping

A Linear mapping.

\[F(\mathbf{x}) = c\]
Parameters:
  • input_dim (int) – dimension of input.
  • output_dim (int) – dimension of output.
Param:

value the value of this constant mapping

f(X)[source]
gradients_X(dL_dF, X)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients(dL_dF, X)[source]

GPy.mappings.identity module

class Identity(input_dim, output_dim, name='identity')[source]

Bases: GPy.core.mapping.Mapping

A mapping that does nothing!

f(X)[source]
gradients_X(dL_dF, X)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients(dL_dF, X)[source]

GPy.mappings.kernel module

class Kernel(input_dim, output_dim, Z, kernel, name='kernmap')[source]

Bases: GPy.core.mapping.Mapping

Mapping based on a kernel/covariance function.

\[f(\mathbf{x}) = \sum_i lpha_i k(\mathbf{z}_i, \mathbf{x})\]

or for multple outputs

\[f_i(\mathbf{x}) = \sum_j lpha_{i,j} k(\mathbf{z}_i, \mathbf{x})\]
Parameters:
  • input_dim (int) – dimension of input.
  • output_dim (int) – dimension of output.
  • Z (ndarray) – input observations containing \(\mathbf{Z}\)
  • kernel (GPy.kern.kern) – a GPy kernel, defaults to GPy.kern.RBF
f(X)[source]
gradients_X(dL_dF, X)[source]
update_gradients(dL_dF, X)[source]

GPy.mappings.linear module

class Linear(input_dim, output_dim, name='linmap')[source]

Bases: GPy.core.mapping.Mapping

A Linear mapping.

\[F(\mathbf{x}) = \mathbf{A} \mathbf{x})\]
Parameters:
  • input_dim (int) – dimension of input.
  • output_dim (int) – dimension of output.
  • kernel (GPy.kern.kern) – a GPy kernel, defaults to GPy.kern.RBF
f(X)[source]
gradients_X(dL_dF, X)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
update_gradients(dL_dF, X)[source]

GPy.mappings.mlp module

class MLP(input_dim=1, output_dim=1, hidden_dim=3, name='mlpmap')[source]

Bases: GPy.core.mapping.Mapping

Mapping based on a multi-layer perceptron neural network model, with a single hidden layer

f(X)[source]
gradients_X(dL_dF, X)[source]
update_gradients(dL_dF, X)[source]

GPy.mappings.mlpext module

class MLPext(input_dim=1, output_dim=1, hidden_dims=[3], prior=None, activation='tanh', name='mlpmap')[source]

Bases: GPy.core.mapping.Mapping

Mapping based on a multi-layer perceptron neural network model, with multiple hidden layers. Activation function is applied to all hidden layers. The output is a linear combination of the last layer features, i.e. the last layer is linear.

Parameters:
  • input_dim – number of input dimensions
  • output_dim – number of output dimensions
  • hidden_dims – list of hidden sizes of hidden layers
  • prior – variance of Gaussian prior on all variables. If None, no prior is used (default: None)
  • activation – choose activation function. Allowed values are ‘tanh’ and ‘sigmoid’
  • name
f(X)[source]
fix_parameters()[source]

Helper function that fixes all parameters

gradients_X(dL_dF, X)[source]
unfix_parameters()[source]

Helper function that unfixes all parameters

update_gradients(dL_dF, X)[source]

GPy.mappings.piecewise_linear module

class PiecewiseLinear(input_dim, output_dim, values, breaks, name='piecewise_linear')[source]

Bases: GPy.core.mapping.Mapping

A piecewise-linear mapping.

The parameters of this mapping are the positions and values of the function where it is broken (self.breaks, self.values).

Outside the range of the breaks, the function is assumed to have gradient 1

f(X)[source]
gradients_X(dL_dF, X)[source]
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients(dL_dF, X)[source]

GPy.examples package

Introduction

The examples in this package usually depend on pods so make sure you have that installed before running examples. The easiest way to do this is to run pip install pods. pods enables access to 3rd party data required for most of the examples.

The examples are executable and self-contained workflows in that they have their own source data, create their own models, kernels and other objects as needed, execute optimisation as required, and display output.

Viewing the source code of each model will clarify the steps taken in its execution, and may provide inspiration for developing of user-specific applications of GPy.

Submodules

GPy.examples.classification module

Gaussian Processes classification examples

crescent_data(model_type='Full', num_inducing=10, seed=10000, kernel=None, optimize=True, plot=True)[source]

Run a Gaussian process classification on the crescent data. The demonstration calls the basic GP classification model and uses EP to approximate the likelihood.

Parameters:
  • model_type – type of model to fit [‘Full’, ‘FITC’, ‘DTC’].
  • inducing (int) – number of inducing variables (only used for ‘FITC’ or ‘DTC’).
  • seed (int) – seed value for data generation.
  • kernel (a GPy kernel) – kernel to use in the model
oil(num_inducing=50, max_iters=100, kernel=None, optimize=True, plot=True)[source]

Run a Gaussian process classification on the three phase oil data. The demonstration calls the basic GP classification model and uses EP to approximate the likelihood.

sparse_toy_linear_1d_classification(num_inducing=10, seed=10000, optimize=True, plot=True)[source]

Sparse 1D classification example

Parameters:seed (int) – seed value for data generation (default is 4).
sparse_toy_linear_1d_classification_uncertain_input(num_inducing=10, seed=10000, optimize=True, plot=True)[source]

Sparse 1D classification example

Parameters:seed (int) – seed value for data generation (default is 4).
toy_heaviside(seed=10000, max_iters=100, optimize=True, plot=True)[source]

Simple 1D classification example using a heavy side gp transformation

Parameters:seed (int) – seed value for data generation (default is 4).
toy_linear_1d_classification(seed=10000, optimize=True, plot=True)[source]

Simple 1D classification example using EP approximation

Parameters:seed (int) – seed value for data generation (default is 4).
toy_linear_1d_classification_laplace(seed=10000, optimize=True, plot=True)[source]

Simple 1D classification example using Laplace approximation

Parameters:seed (int) – seed value for data generation (default is 4).

GPy.examples.dimensionality_reduction module

bcgplvm_linear_stick(kernel=None, optimize=True, verbose=True, plot=True)[source]
bcgplvm_stick(kernel=None, optimize=True, verbose=True, plot=True)[source]
bgplvm_oil(optimize=True, verbose=1, plot=True, N=200, Q=7, num_inducing=40, max_iters=1000, **k)[source]
bgplvm_simulation(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0)[source]
bgplvm_simulation_missing_data(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0, percent_missing=0.1, d=13)[source]
bgplvm_simulation_missing_data_stochastics(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0, percent_missing=0.1, d=13, batchsize=2)[source]
bgplvm_test_model(optimize=False, verbose=1, plot=False, output_dim=200, nan=False)[source]

model for testing purposes. Samples from a GP with rbf kernel and learns the samples with a new kernel. Normally not for optimization, just model cheking

brendan_faces(optimize=True, verbose=True, plot=True)[source]
cmu_mocap(subject='35', motion=['01'], in_place=True, optimize=True, verbose=True, plot=True)[source]
gplvm_oil_100(optimize=True, verbose=1, plot=True)[source]
gplvm_simulation(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0)[source]
mrd_simulation(optimize=True, verbose=True, plot=True, plot_sim=True, **kw)[source]
mrd_simulation_missing_data(optimize=True, verbose=True, plot=True, plot_sim=True, **kw)[source]
olivetti_faces(optimize=True, verbose=True, plot=True)[source]
robot_wireless(optimize=True, verbose=True, plot=True)[source]
sparse_gplvm_oil(optimize=True, verbose=0, plot=True, N=100, Q=6, num_inducing=15, max_iters=50)[source]
ssgplvm_oil(optimize=True, verbose=1, plot=True, N=200, Q=7, num_inducing=40, max_iters=1000, **k)[source]
ssgplvm_simulation(optimize=True, verbose=1, plot=True, plot_sim=False, max_iters=20000.0, useGPU=False)[source]
ssgplvm_simulation_linear()[source]
stick(kernel=None, optimize=True, verbose=True, plot=True)[source]
stick_bgplvm(model=None, optimize=True, verbose=True, plot=True)[source]

Interactive visualisation of the Stick Man data from Ohio State University with the Bayesian GPLVM.

stick_play(range=None, frame_rate=15, optimize=False, verbose=True, plot=True)[source]
swiss_roll(optimize=True, verbose=1, plot=True, N=1000, num_inducing=25, Q=4, sigma=0.2)[source]

GPy.examples.non_gaussian module

boston_example(optimize=True, plot=True)[source]
student_t_approx(optimize=True, plot=True)[source]

Example of regressing with a student t likelihood using Laplace

GPy.examples.regression module

Gaussian Processes regression examples

coregionalization_sparse(optimize=True, plot=True)[source]

A simple demonstration of coregionalization on two sinusoidal functions using sparse approximations.

coregionalization_toy(optimize=True, plot=True)[source]

A simple demonstration of coregionalization on two sinusoidal functions.

epomeo_gpx(max_iters=200, optimize=True, plot=True)[source]

Perform Gaussian process regression on the latitude and longitude data from the Mount Epomeo runs. Requires gpxpy to be installed on your system to load in the data.

multioutput_gp_with_derivative_observations(plot=True)[source]
multiple_optima(gene_number=937, resolution=80, model_restarts=10, seed=10000, max_iters=300, optimize=True, plot=True)[source]

Show an example of a multimodal error surface for Gaussian process regression. Gene 939 has bimodal behaviour where the noisy mode is higher.

olympic_100m_men(optimize=True, plot=True)[source]

Run a standard Gaussian process regression on the Rogers and Girolami olympics data.

olympic_marathon_men(optimize=True, plot=True)[source]

Run a standard Gaussian process regression on the Olympic marathon data.

parametric_mean_function(max_iters=100, optimize=True, plot=True)[source]

A linear mean function with parameters that we’ll learn alongside the kernel

robot_wireless(max_iters=100, kernel=None, optimize=True, plot=True)[source]

Predict the location of a robot given wirelss signal strength readings.

silhouette(max_iters=100, optimize=True, plot=True)[source]

Predict the pose of a figure given a silhouette. This is a task from Agarwal and Triggs 2004 ICML paper.

simple_mean_function(max_iters=100, optimize=True, plot=True)[source]

The simplest possible mean function. No parameters, just a simple Sinusoid.

sparse_GP_regression_1D(num_samples=400, num_inducing=5, max_iters=100, optimize=True, plot=True, checkgrad=False)[source]

Run a 1D example of a sparse GP regression.

sparse_GP_regression_2D(num_samples=400, num_inducing=50, max_iters=100, optimize=True, plot=True, nan=False)[source]

Run a 2D example of a sparse GP regression.

toy_ARD(max_iters=1000, kernel_type='linear', num_samples=300, D=4, optimize=True, plot=True)[source]
toy_ARD_sparse(max_iters=1000, kernel_type='linear', num_samples=300, D=4, optimize=True, plot=True)[source]
toy_poisson_rbf_1d_laplace(optimize=True, plot=True)[source]

Run a simple demonstration of a standard Gaussian process fitting it to data sampled from an RBF covariance.

toy_rbf_1d(optimize=True, plot=True)[source]

Run a simple demonstration of a standard Gaussian process fitting it to data sampled from an RBF covariance.

toy_rbf_1d_50(optimize=True, plot=True)[source]

Run a simple demonstration of a standard Gaussian process fitting it to data sampled from an RBF covariance.

uncertain_inputs_sparse_regression(max_iters=200, optimize=True, plot=True)[source]

Run a 1D example of a sparse GP regression with uncertain inputs.

warped_gp_cubic_sine(max_iters=100, plot=True)[source]

A test replicating the cubic sine regression problem from Snelson’s paper.

GPy.examples.state_space module

state_space_example()[source]

GPy.util package

Introduction

A variety of utility functions including matrix operations and quick access to test datasets.

Submodules

GPy.util.block_matrices module

block_dot(A, B, diagonal=False)[source]

Element wise dot product on block matricies

+——+——+ +——+——+ +——-+——-+ | | | | | | |A11.B11|B12.B12| | A11 | A12 | | B11 | B12 | | | | +——+——+ o +——+——| = +——-+——-+ | | | | | | |A21.B21|A22.B22| | A21 | A22 | | B21 | B22 | | | | +————-+ +——+——+ +——-+——-+

..Note

If any block of either (A or B) are stored as 1d vectors then we assume that it denotes a diagonal matrix efficient dot product using numpy broadcasting will be used, i.e. A11*B11

If either (A or B) of the diagonal matrices are stored as vectors then a more efficient dot product using numpy broadcasting will be used, i.e. A11*B11

get_block_shapes(B)[source]
get_block_shapes_3d(B)[source]
get_blocks(A, blocksizes)[source]
get_blocks_3d(A, blocksizes, pagesizes=None)[source]

Given a 3d matrix, make a block matrix, where the first and second dimensions are blocked according to blocksizes, and the pages are blocked using pagesizes

unblock(B)[source]

GPy.util.choleskies module

backprop_gradient(dL, L)

Given the derivative of an objective fn with respect to the cholesky L, compute the derivate with respect to the original matrix K, defined as

K = LL^T

where L was obtained by Cholesky decomposition

flat_to_triang(flat_mat)
indexes_to_fix_for_low_rank(rank, size)[source]

Work out which indexes of the flatteneed array should be fixed if we want the cholesky to represent a low rank matrix

multiple_dpotri(Ls)[source]
safe_root(N)[source]
triang_to_cov(L)[source]
triang_to_flat(L)

GPy.util.choleskies_cython module

GPy.util.classification module

conf_matrix(p, labels, names=['1', '0'], threshold=0.5, show=True)[source]

Returns error rate and true/false positives in a binary classification problem - Actual classes are displayed by column. - Predicted classes are displayed by row.

Parameters:
  • p – array of class ‘1’ probabilities.
  • labels – array of actual classes.
  • names – list of class names, defaults to [‘1’,’0’].
  • threshold – probability value used to decide the class.
  • show (False|True) – whether the matrix should be shown or not

GPy.util.cluster_with_offset module

cluster(data, inputs, verbose=False)[source]

Clusters data

Using the new offset model, this method uses a greedy algorithm to cluster the data. It starts with all the data points in separate clusters and tests whether combining them increases the overall log-likelihood (LL). It then iteratively joins pairs of clusters which cause the greatest increase in the LL, until no join increases the LL.

arguments: inputs – the ‘X’s in a list, one item per cluster data – the ‘Y’s in a list, one item per cluster

returns a list of the clusters.

get_log_likelihood(inputs, data, clust)[source]

Get the LL of a combined set of clusters, ignoring time series offsets.

Get the log likelihood of a cluster without worrying about the fact different time series are offset. We’re using it here really for those cases in which we only have one cluster to get the loglikelihood of.

arguments: inputs – the ‘X’s in a list, one item per cluster data – the ‘Y’s in a list, one item per cluster clust – list of clusters to use

returns a tuple: log likelihood and the offset (which is always zero for this model)

get_log_likelihood_offset(inputs, data, clust)[source]

Get the log likelihood of a combined set of clusters, fitting the offsets

arguments: inputs – the ‘X’s in a list, one item per cluster data – the ‘Y’s in a list, one item per cluster clust – list of clusters to use

returns a tuple: log likelihood and the offset

GPy.util.config module

GPy.util.datasets module

authorize_download(dataset_name=None)[source]

Check with the user that the are happy with terms and conditions for the data set.

boston_housing(data_set='boston_housing')[source]
boxjenkins_airline(data_set='boxjenkins_airline', num_train=96)[source]
brendan_faces(data_set='brendan_faces')[source]
cifar10_patches(data_set='cifar-10')[source]

The Candian Institute for Advanced Research 10 image data set. Code for loading in this data is taken from this Boris Babenko’s blog post, original code available here: http://bbabenko.tumblr.com/post/86756017649/learning-low-level-vision-feautres-in-10-lines-of-code

cmu_mocap(subject, train_motions, test_motions=[], sample_every=4, data_set='cmu_mocap')[source]

Load a given subject’s training and test motions from the CMU motion capture data.

cmu_mocap_35_walk_jog(data_set='cmu_mocap')[source]

Load CMU subject 35’s walking and jogging motions, the same data that was used by Taylor, Roweis and Hinton at NIPS 2007. but without their preprocessing. Also used by Lawrence at AISTATS 2007.

cmu_mocap_49_balance(data_set='cmu_mocap')[source]

Load CMU subject 49’s one legged balancing motion that was used by Alvarez, Luengo and Lawrence at AISTATS 2009.

cmu_urls_files(subj_motions, messages=True)[source]

Find which resources are missing on the local disk for the requested CMU motion capture motions.

creep_data(data_set='creep_rupture')[source]

Brun and Yoshida’s metal creep rupture data.

crescent_data(num_data=200, seed=10000)[source]

Data set formed from a mixture of four Gaussians. In each class two of the Gaussians are elongated at right angles to each other and offset to form an approximation to the crescent data that is popular in semi-supervised learning as a toy problem.

param num_data_part:
 number of data to be sampled (default is 200).
type num_data:int
param seed:random seed to be used for data generation.
type seed:int
data_available(dataset_name=None)[source]

Check if the data set is available on the local machine already.

data_details_return(data, data_set)[source]

Update the data component of the data dictionary with details drawn from the data_resources.

decampos_digits(data_set='decampos_characters', which_digits=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9])[source]
della_gatta_TRP63_gene_expression(data_set='della_gatta', gene_number=None)[source]
download_data(dataset_name=None)[source]

Check with the user that the are happy with terms and conditions for the data set, then download it.

download_rogers_girolami_data(data_set='rogers_girolami_data')[source]
download_url(url, store_directory, save_name=None, messages=True, suffix='')[source]

Download a file from a url and save it to disk.

drosophila_knirps(data_set='drosophila_protein')[source]
drosophila_protein(data_set='drosophila_protein')[source]
football_data(season='1314', data_set='football_data')[source]

Football data from English games since 1993. This downloads data from football-data.co.uk for the given season.

fruitfly_tomancak(data_set='fruitfly_tomancak', gene_number=None)[source]
global_average_temperature(data_set='global_temperature', num_train=1000, refresh_data=False)[source]

Data downloaded from Google trends for given query terms.

Warning, if you use this function multiple times in a row you get blocked due to terms of service violations. The function will cache the result of your query, if you wish to refresh an old query set refresh_data to True.

The function is inspired by this notebook: http://nbviewer.ipython.org/github/sahuguet/notebooks/blob/master/GoogleTrends%20meet%20Notebook.ipynb

hapmap3(data_set='hapmap3')[source]

The HapMap phase three SNP dataset - 1184 samples out of 11 populations.

SNP_matrix (A) encoding [see Paschou et all. 2007 (PCA-Correlated SNPs…)]: Let (B1,B2) be the alphabetically sorted bases, which occur in the j-th SNP, then

/ 1, iff SNPij==(B1,B1)
Aij = | 0, iff SNPij==(B1,B2)
-1, iff SNPij==(B2,B2)

The SNP data and the meta information (such as iid, sex and phenotype) are stored in the dataframe datadf, index is the Individual ID, with following columns for metainfo:

  • family_id -> Family ID
  • paternal_id -> Paternal ID
  • maternal_id -> Maternal ID
  • sex -> Sex (1=male; 2=female; other=unknown)
  • phenotype -> Phenotype (-9, or 0 for unknown)
  • population -> Population string (e.g. ‘ASW’ - ‘YRI’)
  • rest are SNP rs (ids)

More information is given in infodf:

  • Chromosome:
    • autosomal chromosemes -> 1-22
    • X X chromosome -> 23
    • Y Y chromosome -> 24
    • XY Pseudo-autosomal region of X -> 25
    • MT Mitochondrial -> 26
  • Relative Positon (to Chromosome) [base pairs]
isomap_faces(num_samples=698, data_set='isomap_face_data')[source]
lee_yeast_ChIP(data_set='lee_yeast_ChIP')[source]
mauna_loa(data_set='mauna_loa', num_train=545, refresh_data=False)[source]
oil(data_set='three_phase_oil_flow')[source]

The three phase oil data from Bishop and James (1993).

oil_100(seed=10000, data_set='three_phase_oil_flow')[source]
olivetti_faces(data_set='olivetti_faces')[source]
olivetti_glasses(data_set='olivetti_glasses', num_training=200, seed=10000)[source]
olympic_100m_men(data_set='rogers_girolami_data')[source]
olympic_100m_women(data_set='rogers_girolami_data')[source]
olympic_200m_men(data_set='rogers_girolami_data')[source]
olympic_200m_women(data_set='rogers_girolami_data')[source]
olympic_400m_men(data_set='rogers_girolami_data')[source]
olympic_400m_women(data_set='rogers_girolami_data')[source]
olympic_marathon_men(data_set='olympic_marathon_men')[source]
olympic_sprints(data_set='rogers_girolami_data')[source]

All olympics sprint winning times for multiple output prediction.

osu_run1(data_set='osu_run1', sample_every=4)[source]
prompt_user(prompt)[source]

Ask user for agreeing to data set licenses.

pumadyn(seed=10000, data_set='pumadyn-32nm')[source]
reporthook(a, b, c)[source]
ripley_synth(data_set='ripley_prnn_data')[source]
robot_wireless(data_set='robot_wireless')[source]
sample_class(f)[source]
silhouette(data_set='ankur_pose_data')[source]
simulation_BGPLVM()[source]
singlecell(data_set='singlecell')[source]
singlecell_rna_seq_deng(dataset='singlecell_deng')[source]
singlecell_rna_seq_islam(dataset='singlecell_islam')[source]
sod1_mouse(data_set='sod1_mouse')[source]
spellman_yeast(data_set='spellman_yeast')[source]
spellman_yeast_cdc15(data_set='spellman_yeast')[source]
swiss_roll(num_samples=3000, data_set='swiss_roll')[source]
swiss_roll_1000()[source]
swiss_roll_generated(num_samples=1000, sigma=0.0)[source]
toy_linear_1d_classification(seed=10000)[source]
toy_rbf_1d(seed=10000, num_samples=500)[source]

Samples values of a function from an RBF covariance with very small noise for inputs uniformly distributed between -1 and 1.

Parameters:
  • seed (int) – seed to use for random sampling.
  • num_samples (int) – number of samples to sample in the function (default 500).
toy_rbf_1d_50(seed=10000)[source]
xw_pen(data_set='xw_pen')[source]

GPy.util.debug module

The module for some general debug tools

checkFinite(arr, name=None)[source]
checkFullRank(m, tol=1e-10, name=None, force_check=False)[source]

GPy.util.decorators module

silence_errors(f)[source]

This wraps a function and it silences numpy errors that happen during the execution. After the function has exited, it restores the previous state of the warnings.

GPy.util.diag module

add(A, b, offset=0)[source]

Add b to the view of A in place (!). Returns modified A. Broadcasting is allowed, thus b can be scalar.

if offset is not zero, make sure b is of right shape!

Parameters:
  • A (ndarray) – 2 dimensional array
  • b (ndarray-like) – either one dimensional or scalar
  • offset (int) – same as in view.
Return type:

view of A, which is adjusted inplace

divide(A, b, offset=0)[source]

Divide the view of A by b in place (!). Returns modified A Broadcasting is allowed, thus b can be scalar.

if offset is not zero, make sure b is of right shape!

Parameters:
  • A (ndarray) – 2 dimensional array
  • b (ndarray-like) – either one dimensional or scalar
  • offset (int) – same as in view.
Return type:

view of A, which is adjusted inplace

multiply(A, b, offset=0)

Times the view of A with b in place (!). Returns modified A Broadcasting is allowed, thus b can be scalar.

if offset is not zero, make sure b is of right shape!

Parameters:
  • A (ndarray) – 2 dimensional array
  • b (ndarray-like) – either one dimensional or scalar
  • offset (int) – same as in view.
Return type:

view of A, which is adjusted inplace

offdiag_view(A, offset=0)[source]
subtract(A, b, offset=0)[source]

Subtract b from the view of A in place (!). Returns modified A. Broadcasting is allowed, thus b can be scalar.

if offset is not zero, make sure b is of right shape!

Parameters:
  • A (ndarray) – 2 dimensional array
  • b (ndarray-like) – either one dimensional or scalar
  • offset (int) – same as in view.
Return type:

view of A, which is adjusted inplace

times(A, b, offset=0)[source]

Times the view of A with b in place (!). Returns modified A Broadcasting is allowed, thus b can be scalar.

if offset is not zero, make sure b is of right shape!

Parameters:
  • A (ndarray) – 2 dimensional array
  • b (ndarray-like) – either one dimensional or scalar
  • offset (int) – same as in view.
Return type:

view of A, which is adjusted inplace

view(A, offset=0)[source]

Get a view on the diagonal elements of a 2D array.

This is actually a view (!) on the diagonal of the array, so you can in-place adjust the view.

:param ndarray A: 2 dimensional numpy array :param int offset: view offset to give back (negative entries allowed) :rtype: ndarray view of diag(A)

>>> import numpy as np
>>> X = np.arange(9).reshape(3,3)
>>> view(X)
array([0, 4, 8])
>>> d = view(X)
>>> d += 2
>>> view(X)
array([ 2,  6, 10])
>>> view(X, offset=-1)
array([3, 7])
>>> subtract(X, 3, offset=-1)
array([[ 2,  1,  2],
       [ 0,  6,  5],
       [ 6,  4, 10]])

GPy.util.functions module

clip_exp(x)[source]
differfln(x0, x1)[source]
logistic(x)[source]
logisticln(x)[source]
normcdf(x)[source]
normcdfln(x)[source]

GPy.util.gpu_init module

The package for scikits.cuda initialization

Global variables: initSuccess providing CUBLAS handle: cublas_handle

closeGPU()[source]

GPy.util.initialization module

Created on 24 Feb 2014

@author: maxz

initialize_latent(init, input_dim, Y)[source]

GPy.util.input_warping_functions module

class IdentifyWarping[source]

Bases: GPy.util.input_warping_functions.InputWarpingFunction

The identity warping function, for testing

f(X, test_data=False)[source]
fgrad_X(X)[source]
update_grads(X, dL_dW)[source]
class InputWarpingFunction(name)[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

Abstract class for input warping functions

f(X, test=False)[source]
fgrad_x(X)[source]
update_grads(X, dL_dW)[source]
class InputWarpingTest[source]

Bases: GPy.util.input_warping_functions.InputWarpingFunction

The identity warping function, for testing

f(X, test_data=False)[source]
fgrad_X(X)[source]
update_grads(X, dL_dW)[source]
class KumarWarping(X, warping_indices=None, epsilon=None, Xmin=None, Xmax=None)[source]

Bases: GPy.util.input_warping_functions.InputWarpingFunction

Kumar Warping for input data

X : array_like, shape = (n_samples, n_features)
The input data that is going to be warped
warping_indices: list of int, optional
The features that are going to be warped Default to warp all the features
epsilon: float, optional
Used to normalized input data to [0+e, 1-e] Default to 1e-6
Xmin : list of float, Optional
The min values for each feature defined by users Default to the train minimum
Xmax : list of float, Optional
The max values for each feature defined by users Default to the train maximum
warping_indices: list of int
The features that are going to be warped Default to warp all the features
warping_dim: int
The number of features to be warped
Xmin : list of float
The min values for each feature defined by users Default to the train minimum
Xmax : list of float
The max values for each feature defined by users Default to the train maximum
epsilon: float
Used to normalized input data to [0+e, 1-e] Default to 1e-6
X_normalized : array_like, shape = (n_samples, n_features)
The normalized training X
scaling : list of float, length = n_features in X
Defined as 1.0 / (self.Xmax - self.Xmin)
params : list of Param
The list of all the parameters used in Kumar Warping
num_parameters: int
The number of parameters used in Kumar Warping
f(X, test_data=False)[source]

Apply warping_function to some Input data

X : array_like, shape = (n_samples, n_features)

test_data: bool, optional
Default to False, should set to True when transforming test data
X_warped : array_like, shape = (n_samples, n_features)
The warped input data

f(x) = 1 - (1 - x^a)^b

fgrad_X(X)[source]

Compute the gradient of warping function with respect to X

X : array_like, shape = (n_samples, n_features)
The location to compute gradient
grad : array_like, shape = (n_samples, n_features)
The gradient for every location at X

grad = a * b * x ^(a-1) * (1 - x^a)^(b-1)

update_grads(X, dL_dW)[source]

Update the gradients of marginal log likelihood with respect to the parameters of warping function

X : array_like, shape = (n_samples, n_features)
The input BEFORE warping
dL_dW : array_like, shape = (n_samples, n_features)
The gradient of marginal log likelihood with respect to the Warped input

let w = f(x), the input after warping, then dW_da = b * (1 - x^a)^(b - 1) * x^a * ln(x) dW_db = - (1 - x^a)^b * ln(1 - x^a) dL_da = dL_dW * dW_da dL_db = dL_dW * dW_db

GPy.util.linalg module

DSYR(*args, **kwargs)[source]
DSYR_blas(A, x, alpha=1.0)[source]

Performs a symmetric rank-1 update operation: A <- A + alpha * np.dot(x,x.T)

Parameters:
  • A – Symmetric NxN np.array
  • x – Nx1 np.array
  • alpha – scalar
DSYR_numpy(A, x, alpha=1.0)[source]

Performs a symmetric rank-1 update operation: A <- A + alpha * np.dot(x,x.T)

Parameters:
  • A – Symmetric NxN np.array
  • x – Nx1 np.array
  • alpha – scalar
backsub_both_sides(L, X, transpose='left')[source]

Return L^-T * X * L^-1, assumuing X is symmetrical and L is lower cholesky

dpotri(A, lower=1)[source]

Wrapper for lapack dpotri function

DPOTRI - compute the inverse of a real symmetric positive
definite matrix A using the Cholesky factorization A = U**T*U or A = L*L**T computed by DPOTRF
Parameters:
  • A – Matrix A
  • lower – is matrix lower (true) or upper (false)
Returns:

A inverse

dpotrs(A, B, lower=1)[source]

Wrapper for lapack dpotrs function :param A: Matrix A :param B: Matrix B :param lower: is matrix lower (true) or upper (false) :returns:

dtrtri(L)[source]

Inverts a Cholesky lower triangular matrix

Parameters:L – lower triangular matrix
Return type:inverse of L
dtrtrs(A, B, lower=1, trans=0, unitdiag=0)[source]

Wrapper for lapack dtrtrs function

DTRTRS solves a triangular system of the form

A * X = B or A**T * X = B,

where A is a triangular matrix of order N, and B is an N-by-NRHS matrix. A check is made to verify that A is nonsingular.

Parameters:
  • A – Matrix A(triangular)
  • B – Matrix B
  • lower – is matrix lower (true) or upper (false)
Returns:

Solution to A * X = B or A**T * X = B

force_F_ordered(A)[source]

return a F ordered version of A, assuming A is triangular

force_F_ordered_symmetric(A)[source]

return a F ordered version of A, assuming A is symmetric

ij_jlk_to_ilk(A, B)[source]

Faster version of einsum ‘ij,jlk->ilk’

ijk_jlk_to_il(A, B)[source]

Faster version of einsum einsum(‘ijk,jlk->il’, A,B)

ijk_ljk_to_ilk(A, B)[source]

Faster version of einsum np.einsum(‘ijk,ljk->ilk’, A, B)

I.e A.dot(B.T) for every dimension

jitchol(A, maxtries=5)[source]
mdot(*args)[source]

Multiply all the arguments using matrix product rules. The output is equivalent to multiplying the arguments one by one from left to right using dot(). Precedence can be controlled by creating tuples of arguments, for instance mdot(a,((b,c),d)) multiplies a (a*((b*c)*d)). Note that this means the output of dot(a,b) and mdot(a,b) will differ if a or b is a pure tuple of numbers.

multiple_pdinv(A)[source]
Parameters:A – A DxDxN numpy array (each A[:,:,i] is pd)
Rval invs:the inverses of A
Rtype invs:np.ndarray
Rval hld:0.5* the log of the determinants of A
Rtype hld:np.array
pca(Y, input_dim)[source]

Principal component analysis: maximum likelihood solution by SVD

Parameters:
  • Y – NxD np.array of data
  • input_dim – int, dimension of projection
Rval X:
  • Nxinput_dim np.array of dimensionality reduced data
Rval W:
  • input_dimxD mapping from X to Y
pddet(A)[source]

Determinant of a positive definite matrix, only symmetric matricies though

pdinv(A, *args)[source]
Parameters:A – A DxD pd numpy array
Rval Ai:the inverse of A
Rtype Ai:np.ndarray
Rval L:the Cholesky decomposition of A
Rtype L:np.ndarray
Rval Li:the Cholesky decomposition of Ai
Rtype Li:np.ndarray
Rval logdet:the log of the determinant of A
Rtype logdet:float64
ppca(Y, Q, iterations=100)[source]

EM implementation for probabilistic pca.

Parameters:
  • Y (array-like) – Observed Data
  • Q (int) – Dimensionality for reduced array
  • iterations (int) – number of iterations for EM
symmetrify(A, upper=False)[source]

Take the square matrix A and make it symmetrical by copting elements from the lower half to the upper

works IN PLACE.

note: tries to use cython, falls back to a slower numpy version

tdot(*args, **kwargs)[source]
tdot_blas(mat, out=None)[source]

returns np.dot(mat, mat.T), but faster for large 2D arrays of doubles.

tdot_numpy(mat, out=None)[source]
trace_dot(a, b)[source]

Efficiently compute the trace of the matrix product of a and b

GPy.util.linalg_cython module

GPy.util.linalg_gpu module

GPy.util.ln_diff_erfs module

ln_diff_erfs(x1, x2, return_sign=False)[source]

Function for stably computing the log of difference of two erfs in a numerically stable manner. :param x1 : argument of the positive erf :type x1: ndarray :param x2 : argument of the negative erf :type x2: ndarray :return: tuple containing (log(abs(erf(x1) - erf(x2))), sign(erf(x1) - erf(x2)))

Based on MATLAB code that was written by Antti Honkela and modified by David Luengo and originally derived from code by Neil Lawrence.

GPy.util.misc module

blockify_dhess_dtheta(func)[source]
blockify_hessian(func)[source]
blockify_third(func)[source]
chain_1(df_dg, dg_dx)[source]

Generic chaining function for first derivative

\[\frac{d(f . g)}{dx} = \frac{df}{dg} \frac{dg}{dx}\]
chain_2(d2f_dg2, dg_dx, df_dg, d2g_dx2)[source]

Generic chaining function for second derivative

\[\frac{d^{2}(f . g)}{dx^{2}} = \frac{d^{2}f}{dg^{2}}(\frac{dg}{dx})^{2} + \frac{df}{dg}\frac{d^{2}g}{dx^{2}}\]
chain_3(d3f_dg3, dg_dx, d2f_dg2, d2g_dx2, df_dg, d3g_dx3)[source]

Generic chaining function for third derivative

\[\frac{d^{3}(f . g)}{dx^{3}} = \frac{d^{3}f}{dg^{3}}(\frac{dg}{dx})^{3} + 3\frac{d^{2}f}{dg^{2}}\frac{dg}{dx}\frac{d^{2}g}{dx^{2}} + \frac{df}{dg}\frac{d^{3}g}{dx^{3}}\]
kmm_init(X, m=10)[source]

This is the same initialization algorithm that is used in Kmeans++. It’s quite simple and very useful to initialize the locations of the inducing points in sparse GPs.

Parameters:
  • X – data
  • m – number of inducing points
linear_grid(D, n=100, min_max=(-100, 100))[source]

Creates a D-dimensional grid of n linearly spaced points

Parameters:
  • D – dimension of the grid
  • n – number of points
  • min_max – (min, max) list
opt_wrapper(m, **kwargs)[source]

Thit function just wraps the optimization procedure of a GPy object so that optimize() pickleable (necessary for multiprocessing).

param_to_array(*param)[source]

Convert an arbitrary number of parameters to :class:ndarray class objects. This is for converting parameter objects to numpy arrays, when using scipy.weave.inline routine. In scipy.weave.blitz there is no automatic array detection (even when the array inherits from :class:ndarray)

safe_cube(f)[source]
safe_exp(f)[source]
safe_quad(f)[source]
safe_square(f)[source]
safe_three_times(f)[source]

GPy.util.mocap module

class acclaim_skeleton(file_name=None)[source]

Bases: GPy.util.mocap.skeleton

get_child_xyz(ind, channels)[source]
load_channels(file_name)[source]
load_skel(file_name)[source]

Loads an ASF file into a skeleton structure.

Parameters:file_name – The file name to load in.
read_bonedata(fid)[source]

Read bone data from an acclaim skeleton file stream.

read_channels(fid)[source]

Read channels from an acclaim file.

read_documentation(fid)[source]

Read documentation from an acclaim skeleton file stream.

read_hierarchy(fid)[source]

Read hierarchy information from acclaim skeleton file stream.

read_line(fid)[source]

Read a line from a file string and check it isn’t either empty or commented before returning.

read_root(fid)[source]

Read the root node from an acclaim skeleton file stream.

read_skel(fid)[source]

Loads an acclaim skeleton format from a file stream.

read_units(fid)[source]

Read units from an acclaim skeleton file stream.

resolve_indices(index, start_val)[source]

Get indices for the skeleton from the channels when loading in channel data.

save_channels(file_name, channels)[source]
set_rotation_matrices()[source]

Set the meta information at each vertex to contain the correct matrices C and Cinv as prescribed by the rotations and rotation orders.

to_xyz(channels)[source]
writ_channels(fid, channels)[source]
class skeleton[source]

Bases: GPy.util.mocap.tree

connection_matrix()[source]
finalize()[source]

After loading in a skeleton ensure parents are correct, vertex orders are correct and rotation matrices are correct.

smooth_angle_channels(channels)[source]

Remove discontinuities in angle channels so that they don’t cause artifacts in algorithms that rely on the smoothness of the functions.

to_xyz(channels)[source]
class tree[source]

Bases: object

branch_str(index, indent='')[source]
find_children()[source]

Take a tree and set the children according to the parents.

Takes a tree structure which lists the parents of each vertex and computes the children for each vertex and places them in.

find_parents()[source]

Take a tree and set the parents according to the children

Takes a tree structure which lists the children of each vertex and computes the parents for each vertex and places them in.

find_root()[source]

Finds the index of the root node of the tree.

get_index_by_id(id)[source]

Give the index associated with a given vertex id.

get_index_by_name(name)[source]

Give the index associated with a given vertex name.

order_vertices()[source]

Order vertices in the graph such that parents always have a lower index than children.

swap_vertices(i, j)[source]

Swap two vertices in the tree structure array. swap_vertex swaps the location of two vertices in a tree structure array.

Parameters:
  • tree – the tree for which two vertices are to be swapped.
  • i – the index of the first vertex to be swapped.
  • j – the index of the second vertex to be swapped.
Rval tree:

the tree structure with the two vertex locations swapped.

class vertex(name, id, parents=[], children=[], meta={})[source]

Bases: object

load_text_data(dataset, directory, centre=True)[source]

Load in a data set of marker points from the Ohio State University C3D motion capture files (http://accad.osu.edu/research/mocap/mocap_data.htm).

parse_text(file_name)[source]

Parse data from Ohio State University text mocap files (http://accad.osu.edu/research/mocap/mocap_data.htm).

read_connections(file_name, point_names)[source]

Read a file detailing which markers should be connected to which for motion capture data.

rotation_matrix(xangle, yangle, zangle, order='zxy', degrees=False)[source]

Compute the rotation matrix for an angle in each direction. This is a helper function for computing the rotation matrix for a given set of angles in a given order.

Parameters:
  • xangle – rotation for x-axis.
  • yangle – rotation for y-axis.
  • zangle – rotation for z-axis.
  • order – the order for the rotations.

GPy.util.multioutput module

ICM(input_dim, num_outputs, kernel, W_rank=1, W=None, kappa=None, name='ICM')[source]

Builds a kernel for an Intrinsic Coregionalization Model

Input_dim:

Input dimensionality (does not include dimension of indices)

Num_outputs:

Number of outputs

Parameters:
  • kernel (a GPy kernel) – kernel that will be multiplied by the coregionalize kernel (matrix B).
  • W_rank (integer) – number tuples of the corregionalization parameters ‘W’
LCM(input_dim, num_outputs, kernels_list, W_rank=1, name='ICM')[source]

Builds a kernel for an Linear Coregionalization Model

Input_dim:

Input dimensionality (does not include dimension of indices)

Num_outputs:

Number of outputs

Parameters:
  • kernel (a GPy kernel) – kernel that will be multiplied by the coregionalize kernel (matrix B).
  • W_rank (integer) – number tuples of the corregionalization parameters ‘W’
Private(input_dim, num_outputs, kernel, output, kappa=None, name='X')[source]

Builds a kernel for an Intrinsic Coregionalization Model

Input_dim:

Input dimensionality

Num_outputs:

Number of outputs

Parameters:
  • kernel (a GPy kernel) – kernel that will be multiplied by the coregionalize kernel (matrix B).
  • W_rank (integer) – number tuples of the corregionalization parameters ‘W’
build_XY(input_list, output_list=None, index=None)[source]
build_likelihood(Y_list, noise_index, likelihoods_list=None)[source]
get_slices(input_list)[source]
index_to_slices(index)[source]

take a numpy array of integers (index) and return a nested list of slices such that the slices describe the start, stop points for each integer in the index.

e.g. >>> index = np.asarray([0,0,0,1,1,1,2,2,2]) returns >>> [[slice(0,3,None)],[slice(3,6,None)],[slice(6,9,None)]]

or, a more complicated example >>> index = np.asarray([0,0,1,1,0,2,2,2,1,1]) returns >>> [[slice(0,2,None),slice(4,5,None)],[slice(2,4,None),slice(8,10,None)],[slice(5,8,None)]]

GPy.util.netpbmfile module

Read and write image data from respectively to Netpbm files.

This implementation follows the Netpbm format specifications at http://netpbm.sourceforge.net/doc/. No gamma correction is performed.

The following image formats are supported: PBM (bi-level), PGM (grayscale), PPM (color), PAM (arbitrary), XV thumbnail (RGB332, read-only).

Author:Christoph Gohlke
Organization:Laboratory for Fluorescence Dynamics, University of California, Irvine
Version:2013.01.18
Requirements
Examples
>>> im1 = numpy.array([[0, 1],[65534, 65535]], dtype=numpy.uint16)
>>> imsave('_tmp.pgm', im1)
>>> im2 = imread('_tmp.pgm')
>>> assert numpy.all(im1 == im2)
class NetpbmFile(arg=None, **kwargs)[source]

Bases: object

Read and write Netpbm PAM, PBM, PGM, PPM, files.

Initialize instance from filename, open file, or numpy array.

asarray(copy=True, cache=False, **kwargs)[source]

Return image data from file as numpy array.

close()[source]

Close open file. Future asarray calls might fail.

write(arg, **kwargs)[source]

Write instance to file.

imread(filename, *args, **kwargs)[source]

Return image data from Netpbm file as numpy array.

args and kwargs are arguments to NetpbmFile.asarray().

>>> image = imread('_tmp.pgm')
imsave(filename, data, maxval=None, pam=False)[source]

Write image data to Netpbm file.

>>> image = numpy.array([[0, 1],[65534, 65535]], dtype=numpy.uint16)
>>> imsave('_tmp.pgm', image)

GPy.util.normalizer module

Created on Aug 27, 2014

@author: Max Zwiessele

class Standardize[source]

Bases: GPy.util.normalizer._Norm

inverse_covariance(covariance)[source]

Convert scaled covariance to unscaled. Args:

covariance - numpy array of shape (n, n)
Returns:
covariance - numpy array of shape (n, n, m) where m is number of
outputs
inverse_mean(X)[source]

Project the normalized object X into space of Y

inverse_variance(var)[source]
normalize(Y)[source]

Project Y into normalized space

scale_by(Y)[source]

Use data matrix Y as normalization space to work in.

scaled()[source]

Whether this Norm object has been initialized.

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object

GPy.util.parallel module

The module of tools for parallelization (MPI)

divide_data(datanum, rank, size)[source]
get_id_within_node(comm=None)[source]
optimize_parallel(model, optimizer=None, messages=True, max_iters=1000, outpath='.', interval=100, name=None, **kwargs)[source]

GPy.util.pca module

Created on 10 Sep 2012

@author: Max Zwiessele @copyright: Max Zwiessele 2012

class PCA(X)[source]

Bases: object

PCA module with automatic primal/dual determination.

center(X)[source]

Center X in PCA space.

plot_2d(X, labels=None, s=20, marker='o', dimensions=(0, 1), ax=None, colors=None, fignum=None, cmap=None, **kwargs)[source]

Plot dimensions dimensions with given labels against each other in PC space. Labels can be any sequence of labels of dimensions X.shape[0]. Labels can be drawn with a subsequent call to legend()

plot_fracs(Q=None, ax=None, fignum=None)[source]

Plot fractions of Eigenvalues sorted in descending order.

project(X, Q=None)[source]

Project X into PCA space, defined by the Q highest eigenvalues. Y = X dot V

GPy.util.quad_integrate module

The file for utilities related to integration by quadrature methods - will contain implementation for gaussian-kronrod integration.

getSubs(Subs, XK, NK=1)[source]
quadgk_int(f, fmin=-inf, fmax=inf, difftol=0.1)[source]

Integrate f from fmin to fmax, do integration by substitution x = r / (1-r**2) when r goes from -1 to 1 , x goes from -inf to inf. the interval for quadgk function is from -1 to +1, so we transform the space from (-inf,inf) to (-1,1) :param f: :param fmin: :param fmax: :param difftol: :return:

quadvgk(feval, fmin, fmax, tol1=1e-05, tol2=1e-05)[source]

numpy implementation makes use of the code here: http://se.mathworks.com/matlabcentral/fileexchange/18801-quadvgk We here use gaussian kronrod integration already used in gpstuff for evaluating one dimensional integrals. This is vectorised quadrature which means that several functions can be evaluated at the same time over a grid of points. :param f: :param fmin: :param fmax: :param difftol: :return:

GPy.util.squashers module

sigmoid(x)[source]
single_softmax(x)[source]
softmax(x)[source]

GPy.util.subarray_and_sorting module

Module author: Max Zwiessele <ibinbei@gmail.com>

common_subarrays(X, axis=0)[source]

Find common subarrays of 2 dimensional X, where axis is the axis to apply the search over. Common subarrays are returned as a dictionary of <subarray, [index]> pairs, where the subarray is a tuple representing the subarray and the index is the index for the subarray in X, where index is the index to the remaining axis.

:param np.ndarray X: 2d array to check for common subarrays in :param int axis: axis to apply subarray detection over.

When the index is 0, compare rows – columns, otherwise.

In a 2d array: >>> import numpy as np >>> X = np.zeros((3,6), dtype=bool) >>> X[[1,1,1],[0,4,5]] = 1; X[1:,[2,3]] = 1 >>> X array([[False, False, False, False, False, False],

[ True, False, True, True, True, True], [False, False, True, True, False, False]], dtype=bool)
>>> d = common_subarrays(X,axis=1)
>>> len(d)
3
>>> X[:, d[tuple(X[:,0])]]
array([[False, False, False],
       [ True,  True,  True],
       [False, False, False]], dtype=bool)
>>> d[tuple(X[:,4])] == d[tuple(X[:,0])] == [0, 4, 5]
True
>>> d[tuple(X[:,1])]
[1]

GPy.util.univariate_Gaussian module

cdfNormal(z)[source]

Robust implementations of cdf of a standard normal.

@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.

*/

derivLogCdfNormal(z)[source]

Robust implementations of derivative of the log cdf of a standard normal.

@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.

inv_std_norm_cdf(x)[source]

Inverse cumulative standard Gaussian distribution Based on Winitzki, S. (2008)

logCdfNormal(z)[source]

Robust implementations of log cdf of a standard normal.

@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.
logPdfNormal(z)[source]

Robust implementations of log pdf of a standard normal.

@see [[https://github.com/mseeger/apbsint/blob/master/src/eptools/potentials/SpecfunServices.h original implementation]] in C from Matthias Seeger.
std_norm_pdf(x)[source]

GPy.util.warping_functions module

class IdentityFunction(closed_inverse=True)[source]

Bases: GPy.util.warping_functions.WarpingFunction

Identity warping function. This is for testing and sanity check purposes and should not be used in practice. The closed_inverse flag should only be set to False for debugging and testing purposes.

f(y)[source]

function transformation y is a list of values (GP training data) of shape [N, 1]

fgrad_y(y)[source]

gradient of f w.r.t to y

fgrad_y_psi(y, return_covar_chain=False)[source]

gradient of f w.r.t to y

update_grads(Y_untransformed, Kiy)[source]
class LogFunction(closed_inverse=True)[source]

Bases: GPy.util.warping_functions.WarpingFunction

Easy wrapper for applying a fixed log warping function to positive-only values. The closed_inverse flag should only be set to False for debugging and testing purposes.

f(y)[source]

function transformation y is a list of values (GP training data) of shape [N, 1]

fgrad_y(y)[source]

gradient of f w.r.t to y

fgrad_y_psi(y, return_covar_chain=False)[source]

gradient of f w.r.t to y

update_grads(Y_untransformed, Kiy)[source]
class TanhFunction(n_terms=3, initial_y=None)[source]

Bases: GPy.util.warping_functions.WarpingFunction

This is the function proposed in Snelson et al.: A sum of tanh functions with linear trends outside the range. Notice the term ‘d’, which scales the linear trend.

n_terms specifies the number of tanh terms to be used

f(y)[source]

Transform y with f using parameter vector psi psi = [[a,b,c]]

\(f = (y * d) + \sum_{terms} a * tanh(b *(y + c))\)

fgrad_y(y, return_precalc=False)[source]

gradient of f w.r.t to y ([N x 1])

Returns:Nx1 vector of derivatives, unless return_precalc is true,

then it also returns the precomputed stuff

fgrad_y_psi(y, return_covar_chain=False)[source]

gradient of f w.r.t to y and psi

Returns:NxIx4 tensor of partial derivatives
update_grads(Y_untransformed, Kiy)[source]
class WarpingFunction(name)[source]

Bases: GPy.core.parameterization.parameterized.Parameterized

abstract function for warping z = f(y)

f(y, psi)[source]

function transformation y is a list of values (GP training data) of shape [N, 1]

f_inv(z, max_iterations=250, y=None)[source]

Calculate the numerical inverse of f. This should be overwritten for specific warping functions where the inverse can be found in closed form.

Parameters:max_iterations – maximum number of N.R. iterations
fgrad_y(y, psi)[source]

gradient of f w.r.t to y

fgrad_y_psi(y, psi)[source]

gradient of f w.r.t to y

plot(xmin, xmax)[source]

GPy.plotting package

Introduction

GPy.plotting effectively extends models based on GPy.core.gp.GP (and other classes) by adding methods to plot useful charts. ‘matplotlib’, ‘plotly’ (online) and ‘plotly’ (offline) are supported. The methods in GPy.plotting (and child classes GPy.plotting.gpy_plot and GPy.plotting.matplot_dep) are not intended to be called directly, but rather are ‘injected’ into other classes (notably GPy.core.gp.GP). Documentation describing plots is best found associated with the model being plotted e.g. GPy.core.gp.GP.plot_confidence.

change_plotting_library(lib, **kwargs)[source]
inject_plotting()[source]
plotting_library()[source]
show(figure, **kwargs)[source]

Show the specific plotting library figure, returned by add_to_canvas().

kwargs are the plotting library specific options for showing/drawing a figure.

Subpackages

GPy.plotting.gpy_plot package
Submodules
GPy.plotting.gpy_plot.data_plots module
plot_data(self, which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)[source]
Plot the training data
  • For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_data_error(self, which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)[source]

Plot the training data input error.

For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two)
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • label (str) – the label for the plot
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
Returns list:

of plots created.

plot_errorbars_trainset(self, which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)[source]

Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.

This also works for heteroscedastic likelihoods.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • which_data_ycols – when the data has several columns (independant outputs), only plot these
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles.
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_inducing(self, visible_dims=None, projection='2d', label='inducing', legend=True, **plot_kwargs)[source]

Plot the inducing inputs of a sparse gp model

Parameters:
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • plot_kwargs (kwargs) – keyword arguments for the plotting library
GPy.plotting.gpy_plot.gp_plots module
plot(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)[source]

Convenience function for plotting the fit of a GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

If you want fine graned control use the specific plotting functions supplied in the model.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want!
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • projection ({2d|3d}) – plot in 2d or 3d?
  • legend (bool) – convenience, whether to put a legend on the plot or not.
plot_confidence(self, lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)[source]

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_density(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)[source]

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_f(self, plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)[source]

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [default:200]
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these
  • which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all)
  • visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two)
  • levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you.
  • samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function.
  • lower (float) – the lower percentile to plot
  • upper (float) – the upper percentile to plot
  • plot_data (bool) – plot the data into the plot?
  • plot_inducing (bool) – plot inducing inputs?
  • plot_density (bool) – plot density instead of the confidence interval?
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • error_kwargs (dict) – kwargs for the error plot for the plotting library you are using
  • plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_mean(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)[source]

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only?
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • levels (int) – for 2D plotting, the number of contour levels to use is
  • projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs!
  • label (str) – the label for the plot.
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
plot_samples(self, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)[source]

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v.
  • resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50]
  • plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want!
  • apply_link (bool) – whether to apply the link function of the GP to the raw prediction.
  • visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints)
  • which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints)
  • predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=<specific kernel>) in here
  • levels (int) – for 2D plotting, the number of contour levels to use is
GPy.plotting.gpy_plot.inference_plots module
plot_optimizer(optimizer, **kwargs)[source]
plot_sgd_traces(optimizer)[source]
GPy.plotting.gpy_plot.kernel_plots module
plot_ARD(kernel, filtering=None, legend=False, canvas=None, **kwargs)[source]

If an ARD kernel is present, plot a bar representation using matplotlib

Parameters:
  • fignum – figure number of the plot
  • filtering (list of names to use for ARD plot) – list of names, which to use for plotting ARD parameters. Only kernels which match names in the list of names in filtering will be used for plotting.
plot_covariance(kernel, x=None, label=None, plot_limits=None, visible_dims=None, resolution=None, projection='2d', levels=20, **kwargs)[source]

Plot a kernel covariance w.r.t. another x.

Parameters:
  • x (array-like) – the value to use for the other kernel argument (kernels are a function of two variables!)
  • plot_limits (Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D) – the range over which to plot the kernel
  • visible_dims (array-like) – input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
  • projection ({2d|3d}) – What projection shall we use to plot the kernel?
  • levels (int) – for 2D projection, how many levels for the contour plot to use?
  • kwargs – valid kwargs for your specific plotting library
Resolution:

the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.

GPy.plotting.gpy_plot.latent_plots module
plot_latent(self, labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)[source]

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • scatter_kwargs – the kwargs for the scatter plots
plot_latent_inducing(self, which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)[source]

Plot a scatter plot of the inducing inputs.

Parameters:
  • which_indices ([int]) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – marker to use [default is custom arrow like]
  • kwargs – the kwargs for the scatter plots
  • projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent_scatter(self, labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)[source]

Plot a scatter plot of the latent space.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • marker (str) – markers to use - cycle if more labels then markers are given
  • kwargs – the kwargs for the scatter plots
plot_magnification(self, labels=None, which_indices=None, resolution=60, marker='<>^vsd', legend=True, plot_limits=None, updates=False, mean=True, covariance=True, kern=None, num_samples=1000, scatter_kwargs=None, plot_scatter=True, **imshow_kwargs)[source]

Plot the magnification factor of the GP on the inputs. This is the density of the GP as a gray scale.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • marker (str) – markers to use - cycle if more labels then markers are given
  • legend (bool) – whether to plot the legend on the figure
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • mean (bool) – use the mean of the Wishart embedding for the magnification factor
  • covariance (bool) – use the covariance of the Wishart embedding for the magnification factor
  • kern (Kern) – the kernel to use for prediction
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(self, output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)[source]

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters:
  • labels (array-like) – a label for each data point (row) of the inputs
  • which_indices ((int, int)) – which input dimensions to plot against each other
  • resolution (int) – the resolution at which we predict the magnification factor
  • legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend
  • plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot
  • updates (bool) – if possible, make interactive updates using the specific library you are using
  • kern (Kern) – the kernel to use for prediction
  • marker (str) – markers to use - cycle if more labels then markers are given
  • num_samples (int) – the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples.
  • imshow_kwargs – the kwargs for the imshow (magnification factor)
  • annotation_kwargs – the kwargs for the annotation plot
  • scatter_kwargs – the kwargs for the scatter plots
GPy.plotting.gpy_plot.plot_util module
find_best_layout_for_subplots(num_subplots)[source]
get_fixed_dims(fixed_inputs)[source]

Work out the fixed dimensions from the fixed_inputs list of tuples.

get_free_dims(model, visible_dims, fixed_dims)[source]

work out what the inputs are for plotting (1D or 2D)

The visible dimensions are the dimensions, which are visible. the fixed_dims are the fixed dimensions for this.

The free_dims are then the visible dims without the fixed dims.

get_which_data_rows(model, which_data_rows)[source]

Helper to get the data rows to plot.

get_which_data_ycols(model, which_data_ycols)[source]

Helper to get the data columns to plot.

get_x_y_var(model)[source]

Either the the data from a model as X the inputs, X_variance the variance of the inputs ([default: None]) and Y the outputs

If (X, X_variance, Y) is given, this just returns.

Returns:(X, X_variance, Y)
helper_for_plot_data(self, X, plot_limits, visible_dims, fixed_inputs, resolution)[source]

Figure out the data, free_dims and create an Xgrid for the prediction.

This is only implemented for two dimensions for now!

helper_predict_with_model(self, Xgrid, plot_raw, apply_link, percentiles, which_data_ycols, predict_kw, samples=0)[source]

Make the right decisions for prediction with a model based on the standard arguments of plotting.

This is quite complex and will take a while to understand, so do not change anything in here lightly!!!

in_ipynb()[source]
scatter_label_generator(labels, X, visible_dims, marker=None)[source]
subsample_X(X, labels, num_samples=1000)[source]

Stratified subsampling if labels are given. This means due to rounding errors you might get a little differences between the num_samples and the returned subsampled X.

update_not_existing_kwargs(to_update, update_from)[source]

This function updates the keyword aguments from update_from in to_update, only if the keys are not set in to_update.

This is used for updated kwargs from the default dicts.

x_frame1D(X, plot_limits=None, resolution=None)[source]

Internal helper function for making plots, returns a set of input values to plot as well as lower and upper limits

x_frame2D(X, plot_limits=None, resolution=None)[source]

Internal helper function for making plots, returns a set of input values to plot as well as lower and upper limits

GPy.plotting.matplot_dep package
Subpackages
GPy.plotting.matplot_dep.controllers package
Submodules
GPy.plotting.matplot_dep.controllers.axis_event_controller module

Created on 24 Jul 2013

@author: maxz

class AxisChangedController(ax, update_lim=None)[source]

Bases: GPy.plotting.matplot_dep.controllers.axis_event_controller.AxisEventController

Buffered control of axis limit changes

Constructor

extent(lim)[source]
lim_changed(axlim, savedlim)[source]
update(ax)[source]
xlim_changed(ax)[source]
ylim_changed(ax)[source]
class AxisEventController(ax)[source]

Bases: object

activate()[source]
deactivate()[source]
xlim_changed(ax)[source]
ylim_changed(ax)[source]
class BufferedAxisChangedController(ax, plot_function, plot_limits, resolution=50, update_lim=None, **kwargs)[source]

Bases: GPy.plotting.matplot_dep.controllers.axis_event_controller.AxisChangedController

Buffered axis changed controller. Controls the buffer and handles update events for when the axes changed.

Updated plotting will be after first reload (first time will be within plot limits, after that the limits will be buffered)

Parameters:
  • plot_function (function) – function to use for creating image for plotting (return ndarray-like) plot_function gets called with (2D!) Xtest grid if replotting required
  • plot_limits – beginning plot limits [xmin, ymin, xmax, ymax]
  • kwargs – additional kwargs are for pyplot.imshow(**kwargs)
get_grid(buffered=True)[source]
recompute_X(buffered=True)[source]
update(ax)[source]
update_view(view, X, xmin, xmax, ymin, ymax)[source]
GPy.plotting.matplot_dep.controllers.imshow_controller module

Created on 24 Jul 2013

@author: maxz

class ImAnnotateController(ax, plot_function, plot_limits, resolution=20, update_lim=0.99, imshow_kwargs=None, **kwargs)[source]

Bases: GPy.plotting.matplot_dep.controllers.imshow_controller.ImshowController

Parameters:
  • plot_function (function) – function to use for creating image for plotting (return ndarray-like) plot_function gets called with (2D!) Xtest grid if replotting required
  • plot_limits – beginning plot limits [xmin, ymin, xmax, ymax]
  • text_props – kwargs for pyplot.text(**text_props)
  • kwargs – additional kwargs are for pyplot.imshow(**kwargs)
update_view(view, X, xmin, xmax, ymin, ymax)[source]
class ImshowController(ax, plot_function, plot_limits, resolution=50, update_lim=0.9, **kwargs)[source]

Bases: GPy.plotting.matplot_dep.controllers.axis_event_controller.BufferedAxisChangedController

Parameters:
  • plot_function (function) – function to use for creating image for plotting (return ndarray-like) plot_function gets called with (2D!) Xtest grid if replotting required
  • plot_limits – beginning plot limits [xmin, ymin, xmax, ymax]
  • kwargs – additional kwargs are for pyplot.imshow(**kwargs)
update_view(view, X, xmin, xmax, ymin, ymax)[source]
Submodules
GPy.plotting.matplot_dep.base_plots module
ax_default(fignum, ax)[source]
fewerXticks(ax=None, divideby=2)[source]
gperrors(x, mu, lower, upper, edgecol=None, ax=None, fignum=None, **kwargs)[source]
gpplot(x, mu, lower, upper, edgecol='#3300FF', fillcol='#33CCFF', ax=None, fignum=None, **kwargs)[source]
gradient_fill(x, percentiles, ax=None, fignum=None, **kwargs)[source]
meanplot(x, mu, color='#3300FF', ax=None, fignum=None, linewidth=2, **kw)[source]
removeRightTicks(ax=None)[source]
removeUpperTicks(ax=None)[source]
x_frame1D(X, plot_limits=None, resolution=None)[source]

Internal helper function for making plots, returns a set of input values to plot as well as lower and upper limits

x_frame2D(X, plot_limits=None, resolution=None)[source]

Internal helper function for making plots, returns a set of input values to plot as well as lower and upper limits

GPy.plotting.matplot_dep.defaults module
GPy.plotting.matplot_dep.img_plots module

The module contains the tools for ploting 2D image visualizations

plot_2D_images(figure, arr, symmetric=False, pad=None, zoom=None, mode=None, interpolation='nearest')[source]
GPy.plotting.matplot_dep.mapping_plots module
plot_mapping(self, plot_limits=None, which_data='all', which_parts='all', resolution=None, levels=20, samples=0, fignum=None, ax=None, fixed_inputs=[], linecol='#204a87')[source]
Plots the mapping associated with the model.
  • In one dimension, the function is plotted.
  • In two dimsensions, a contour-plot shows the function
  • In higher dimensions, we’ve not implemented this yet !TODO!

Can plot only part of the data and part of the posterior functions using which_data and which_functions

Parameters:
  • plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits
  • which_data ('all' or a slice object to slice self.X, self.Y) – which if the training data to plot (default all)
  • which_parts ('all', or list of bools) – which of the kernel functions to plot (additively)
  • resolution (int) – the number of intervals to sample the GP on. Defaults to 200 in 1D and 50 (a 50x50 grid) in 2D
  • levels (int) – number of levels to plot in a contour plot.
  • samples (int) – the number of a posteriori samples to plot
  • fignum (figure number) – figure to plot on.
  • ax (axes handle) – axes to plot on.
  • fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input index i should be set to value v.
  • linecol – color of line to plot.
  • levels – for 2D plotting, the number of contour levels to use is ax is None, create a new figure
GPy.plotting.matplot_dep.maps module
apply_bbox(sf, ax)[source]

Use bbox as xlim and ylim in ax

bbox_match(sf, bbox, inside_only=True)[source]

Return the geometry and attributes of a shapefile that lie within (or intersect) a bounding box

Parameters:
  • sf (shapefile object) – shapefile
  • bbox (list of floats [x_min,y_min,x_max,y_max]) – bounding box
Inside_only:

True if the objects returned are those that lie within the bbox and False if the objects returned are any that intersect the bbox

new_shape_string(sf, name, regex, field=2, type=None)[source]
plot(shape_records, facecolor='w', edgecolor='k', linewidths=0.5, ax=None, xlims=None, ylims=None)[source]

Plot the geometry of a shapefile

Parameters:
  • shape_records (ShapeRecord object (output of a shapeRecords() method)) – geometry and attributes list
  • facecolor – color to be used to fill in polygons
  • edgecolor – color to be used for lines
  • ax (axes handle) – axes to plot on.
plot_bbox(sf, bbox, inside_only=True)[source]

Plot the geometry of a shapefile within a bbox

Parameters:
  • sf (shapefile object) – shapefile
  • bbox (list of floats [x_min,y_min,x_max,y_max]) – bounding box
Inside_only:

True if the objects returned are those that lie within the bbox and False if the objects returned are any that intersect the bbox

plot_string_match(sf, regex, field, **kwargs)[source]

Plot the geometry of a shapefile whose fields match a regular expression given

Parameters:sf (shapefile object) – shapefile
Regex:regular expression to match
Field:field number to be matched with the regex
string_match(sf, regex, field=2)[source]

Return the geometry and attributes of a shapefile whose fields match a regular expression given

Parameters:sf (shapefile object) – shapefile
Regex:regular expression to match
Field:field number to be matched with the regex
GPy.plotting.matplot_dep.plot_definitions module
class MatplotlibPlots[source]

Bases: GPy.plotting.abstract_plotting_library.AbstractPlottingLibrary

add_to_canvas(ax, plots, legend=False, title=None, **kwargs)[source]

Add plots is a dictionary with the plots as the items or a list of plots as items to canvas.

The kwargs are plotting library specific kwargs!

E.g. in matplotlib this does not have to do anything to add stuff, but we set the legend and title.

!This function returns the updated canvas!

Parameters:
  • title – the title of the plot
  • legend – whether to plot a legend or not
annotation_heatmap(ax, X, annotation, extent=None, label=None, imshow_kwargs=None, **annotation_kwargs)[source]

Plot an annotation heatmap. That is like an imshow, but put the text of the annotation inside the cells of the heatmap (centered).

Parameters:
  • canvas – the canvas to plot on
  • annotation (array-like) – the annotation labels for the heatmap
  • extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
  • label (str) – the label for the heatmap
Returns:

a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)

annotation_heatmap_interact(ax, plot_function, extent, label=None, resolution=15, imshow_kwargs=None, **annotation_kwargs)[source]

if plot_function is not None, return an interactive updated heatmap, which updates on axis events, so that one can zoom in and out and the heatmap gets updated. See the matplotlib implementation in matplot_dep.controllers.

the plot_function returns a pair (X, annotation) to plot, when called with a new input X (which would be the grid, which is visible on the plot right now)

Parameters:
  • canvas – the canvas to plot on
  • annotation (array-like) – the annotation labels for the heatmap
  • extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
  • label (str) – the label for the heatmap
  • plot_function – the function, which generates new data for given input locations X
  • resolution (int) – the resolution of the interactive plot redraw - this is only needed when giving a plot_function
Returns:

a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)

barplot(ax, x, height, width=0.8, bottom=0, color='#3465a4', label=None, **kwargs)[source]

Plot vertical bar plot centered at x with height and width of bars. The y level is at bottom.

the kwargs are plotting library specific kwargs!

Parameters:
  • x (array-like) – the center points of the bars
  • height (array-like) – the height of the bars
  • width (array-like) – the width of the bars
  • bottom (array-like) – the start y level of the bars
  • kwargs – kwargs for the specific library you are using.
contour(ax, X, Y, C, levels=20, label=None, **kwargs)[source]

Make a contour plot at (X, Y) with heights/colors stored in C on the canvas.

if Z is not None: make 3d contour plot at (X, Y, Z) with heights/colors stored in C on the canvas.

the kwargs are plotting library specific kwargs!

figure(rows=1, cols=1, gridspec_kwargs={}, tight_layout=True, **kwargs)[source]

Get a new figure with nrows and ncolumns subplots. Does not initialize the canvases yet.

There is individual kwargs for the individual plotting libraries to use.

fill_between(ax, X, lower, upper, color='#3465a4', label=None, **kwargs)[source]

Fill along the xaxis between lower and upper.

the kwargs are plotting library specific kwargs!

fill_gradient(canvas, X, percentiles, color='#3465a4', label=None, **kwargs)[source]

Plot a gradient (in alpha values) for the given percentiles.

the kwargs are plotting library specific kwargs!

imshow(ax, X, extent=None, label=None, vmin=None, vmax=None, **imshow_kwargs)[source]

Show the image stored in X on the canvas.

The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!

the kwargs are plotting library specific kwargs!

imshow_interact(ax, plot_function, extent, label=None, resolution=None, vmin=None, vmax=None, **imshow_kwargs)[source]

This function is optional!

Create an imshow controller to stream the image returned by the plot_function. There is an imshow controller written for mmatplotlib, which updates the imshow on changes in axis.

The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!

the kwargs are plotting library specific kwargs!

new_canvas(figure=None, row=1, col=1, projection='2d', xlabel=None, ylabel=None, zlabel=None, title=None, xlim=None, ylim=None, zlim=None, **kwargs)[source]

Return a canvas, kwargupdate for your plotting library.

if figure is not None, create a canvas in the figure at subplot position (col, row).

This method does two things, it creates an empty canvas and updates the kwargs (deletes the unnecessary kwargs) for further usage in normal plotting.

the kwargs are plotting library specific kwargs!

Parameters:projection ({'2d'|'3d'}) – The projection to use.

E.g. in matplotlib this means it deletes references to ax, as plotting is done on the axis itself and is not a kwarg.

Parameters:
  • xlabel – the label to put on the xaxis
  • ylabel – the label to put on the yaxis
  • zlabel – the label to put on the zaxis (if plotting in 3d)
  • title – the title of the plot
  • legend – if True, plot a legend, if int make legend rows in the legend
  • float) xlim ((float,) – the limits for the xaxis
  • float) ylim ((float,) – the limits for the yaxis
  • float) zlim ((float,) – the limits for the zaxis (if plotting in 3d)
plot(ax, X, Y, Z=None, color=None, label=None, **kwargs)[source]

Make a line plot from for Y on X (Y = f(X)) on the canvas. If Z is not None, plot in 3d!

the kwargs are plotting library specific kwargs!

plot_axis_lines(ax, X, color='#a40000', label=None, **kwargs)[source]

Plot lines at the bottom (lower boundary of yaxis) of the axis at input location X.

If X is two dimensional, plot in 3d and connect the axis lines to the bottom of the Z axis.

the kwargs are plotting library specific kwargs!

scatter(ax, X, Y, Z=None, color='#3465a4', label=None, marker='o', **kwargs)[source]

Make a scatter plot between X and Y on the canvas given.

the kwargs are plotting library specific kwargs!

Parameters:
  • canvas – the plotting librarys specific canvas to plot on.
  • X (array-like) – the inputs to plot.
  • Y (array-like) – the outputs to plot.
  • Z (array-like) – the Z level to plot (if plotting 3d).
  • c (array-like) – the colorlevel for each point.
  • vmin (float) – minimum colorscale
  • vmax (float) – maximum colorscale
  • kwargs – the specific kwargs for your plotting library
show_canvas(ax, **kwargs)[source]

Draw/Plot the canvas given.

surface(ax, X, Y, Z, color=None, label=None, **kwargs)[source]

Plot a surface for 3d plotting for the inputs (X, Y, Z).

the kwargs are plotting library specific kwargs!

xerrorbar(ax, X, Y, error, color='#a40000', label=None, **kwargs)[source]

Make an errorbar along the xaxis for points at (X,Y) on the canvas. if error is two dimensional, the lower error is error[:,0] and the upper error is error[:,1]

the kwargs are plotting library specific kwargs!

yerrorbar(ax, X, Y, error, color='#a40000', label=None, **kwargs)[source]

Make errorbars along the yaxis on the canvas given. if error is two dimensional, the lower error is error[0, :] and the upper error is error[1, :]

the kwargs are plotting library specific kwargs!

GPy.plotting.matplot_dep.priors_plots module
plot(prior)[source]
univariate_plot(prior)[source]
GPy.plotting.matplot_dep.ssgplvm module

The module plotting results for SSGPLVM

class SSGPLVM_plot(model, imgsize)[source]

Bases: object

plot_inducing()[source]
GPy.plotting.matplot_dep.svig_plots module
plot(model, ax=None, fignum=None, Z_height=None, **kwargs)[source]
plot_traces(model)[source]
GPy.plotting.matplot_dep.util module
align_subplot_array(axes, xlim=None, ylim=None)[source]

Make all of the axes in the array hae the same limits, turn off unnecessary ticks use plt.subplots() to get an array of axes

align_subplots(N, M, xlim=None, ylim=None)[source]

make all of the subplots have the same limits, turn off unnecessary ticks

fewerXticks(ax=None, divideby=2)[source]
fixed_inputs(model, non_fixed_inputs, fix_routine='median', as_list=True, X_all=False)[source]

Convenience function for returning back fixed_inputs where the other inputs are fixed using fix_routine :param model: model :type model: Model :param non_fixed_inputs: dimensions of non fixed inputs :type non_fixed_inputs: list :param fix_routine: fixing routine to use, ‘mean’, ‘median’, ‘zero’ :type fix_routine: string :param as_list: if true, will return a list of tuples with (dimension, fixed_val) otherwise it will create the corresponding X matrix :type as_list: boolean

legend_ontop(ax, mode='expand', ncol=3, fontdict=None)[source]
removeRightTicks(ax=None)[source]
removeUpperTicks(ax=None)[source]
GPy.plotting.matplot_dep.variational_plots module
plot(parameterized, fignum=None, ax=None, colors=None, figsize=(12, 6))[source]

Plot latent space X in 1D:

  • if fig is given, create input_dim subplots in fig and plot in these
  • if ax is given plot input_dim 1D latent space plots of X into each axis
  • if neither fig nor ax is given create a figure with fignum and plot in there
colors:
colors of different latent space dimensions input_dim
plot_SpikeSlab(parameterized, fignum=None, ax=None, colors=None, side_by_side=True)[source]

Plot latent space X in 1D:

  • if fig is given, create input_dim subplots in fig and plot in these
  • if ax is given plot input_dim 1D latent space plots of X into each axis
  • if neither fig nor ax is given create a figure with fignum and plot in there
colors:
colors of different latent space dimensions input_dim
GPy.plotting.matplot_dep.visualize module
class data_show(vals)[source]

Bases: object

The data_show class is a base class which describes how to visualize a particular data set. For example, motion capture data can be plotted as a stick figure, or images are shown using imshow. This class enables latent to data visualizations for the GP-LVM.

close()[source]
modify(vals)[source]
class image_show(vals, axes=None, dimensions=(16, 16), transpose=False, order='C', invert=False, scale=False, palette=[], preset_mean=0.0, preset_std=1.0, select_image=0, cmap=None)[source]

Bases: GPy.plotting.matplot_dep.visualize.matplotlib_show

Show a data vector as an image. This visualizer rehapes the output vector and displays it as an image.

Parameters:
  • vals (axes handle) – the values of the output to display.
  • axes – the axes to show the output on.
  • dimensions (tuple) – the dimensions that the image needs to be transposed to for display.
  • transpose – whether to transpose the image before display.
  • order (string) – whether array is in Fortan ordering (‘F’) or Python ordering (‘C’). Default is python (‘C’).
  • invert (bool) – whether to invert the pixels or not (default False).
  • palette – a palette to use for the image.
  • preset_mean (double) – the preset mean of a scaled image.
  • preset_std (double) – the preset standard deviation of a scaled image.
  • cmap (matplotlib.cm) – the colormap for image visualization
modify(vals)[source]
set_image(vals)[source]
class lvm(vals, model, data_visualize, latent_axes=None, sense_axes=None, latent_index=[0, 1], disable_drag=False)[source]

Bases: GPy.plotting.matplot_dep.visualize.matplotlib_show

Visualize a latent variable model

Parameters:
  • model – the latent variable model to visualize.
  • data_visualize (visualize.data_show type.) – the object used to visualize the data which has been modelled.
  • latent_axes – the axes where the latent visualization should be plotted.
modify(vals)[source]

When latent values are modified update the latent representation and ulso update the output visualization.

on_click(event)[source]
on_enter(event)[source]
on_leave(event)[source]
on_move(event)[source]
show_sensitivities()[source]
class lvm_dimselect(vals, model, data_visualize, latent_axes=None, sense_axes=None, latent_index=[0, 1], labels=None)[source]

Bases: GPy.plotting.matplot_dep.visualize.lvm

A visualizer for latent variable models which allows selection of the latent dimensions to use by clicking on a bar chart of their length scales.

For an example of the visualizer’s use try:

GPy.examples.dimensionality_reduction.BGPVLM_oil()

on_click(event)[source]
on_leave(event)[source]
class lvm_subplots(vals, Model, data_visualize, latent_axes=None, sense_axes=None)[source]

Bases: GPy.plotting.matplot_dep.visualize.lvm

latent_axes is a np array of dimension np.ceil(input_dim/2), one for each pair of the latent dimensions.

class matplotlib_show(vals, axes=None)[source]

Bases: GPy.plotting.matplot_dep.visualize.data_show

the matplotlib_show class is a base class for all visualization methods that use matplotlib. It is initialized with an axis. If the axis is set to None it creates a figure window.

close()[source]
class mocap_data_show(vals, axes=None, connect=None, color='b')[source]

Bases: GPy.plotting.matplot_dep.visualize.matplotlib_show

Base class for visualizing motion capture data.

draw_edges()[source]
draw_vertices()[source]
finalize_axes()[source]
finalize_axes_modify()[source]
initialize_axes(boundary=0.05)[source]

Set up the axes with the right limits and scaling.

initialize_axes_modify()[source]
modify(vals)[source]
process_values()[source]
class mocap_data_show_vpython(vals, scene=None, connect=None, radius=0.1)[source]

Bases: GPy.plotting.matplot_dep.visualize.vpython_show

Base class for visualizing motion capture data using visual module.

draw_edges()[source]
draw_vertices()[source]
modify(vals)[source]
modify_edges()[source]
modify_vertices()[source]
pos_axis(i, j)[source]
process_values()[source]
class skeleton_show(vals, skel, axes=None, padding=0, color='b')[source]

Bases: GPy.plotting.matplot_dep.visualize.mocap_data_show

data_show class for visualizing motion capture data encoded as a skeleton with angles.

data_show class for visualizing motion capture data encoded as a skeleton with angles. :param vals: set of modeled angles to use for printing in the axis when it’s first created. :type vals: np.array :param skel: skeleton object that has the parameters of the motion capture skeleton associated with it. :type skel: mocap.skeleton object :param padding: :type int

process_values()[source]

Takes a set of angles and converts them to the x,y,z coordinates in the internal prepresentation of the class, ready for plotting.

Parameters:vals – the values that are being modelled.
wrap_around(lim, connect)[source]
class stick_show(vals, connect=None, axes=None)[source]

Bases: GPy.plotting.matplot_dep.visualize.mocap_data_show

Show a three dimensional point cloud as a figure. Connect elements of the figure together using the matrix connect.

process_values()[source]

Convert vector of values into a matrix for use as a 3-D point cloud.

class vector_show(vals, axes=None)[source]

Bases: GPy.plotting.matplot_dep.visualize.matplotlib_show

A base visualization class that just shows a data vector as a plot of vector elements alongside their indices.

modify(vals)[source]
class vpython_show(vals, scene=None)[source]

Bases: GPy.plotting.matplot_dep.visualize.data_show

the vpython_show class is a base class for all visualization methods that use vpython to display. It is initialized with a scene. If the scene is set to None it creates a scene window.

close()[source]
data_play(Y, visualizer, frame_rate=30)[source]

Play a data set using the data_show object given.

Y:the data set to be visualized.
Parameters:visualizer (data_show) – the data show objectwhether to display during optimisation

Example usage:

This example loads in the CMU mocap database (http://mocap.cs.cmu.edu) subject number 35 motion number 01. It then plays it using the mocap_show visualize object.

data = GPy.util.datasets.cmu_mocap(subject='35', train_motions=['01'])
Y = data['Y']
Y[:, 0:3] = 0.   # Make figure walk in place
visualize = GPy.util.visualize.skeleton_show(Y[0, :], data['skel'])
GPy.util.visualize.data_play(Y, visualize)
GPy.plotting.plotly_dep package
Submodules
GPy.plotting.plotly_dep.defaults module
GPy.plotting.plotly_dep.plot_definitions module

Submodules

GPy.plotting.Tango module

currentDark()[source]
currentLight()[source]
currentMedium()[source]
hex2rgb(hexcolor)[source]
nextDark()[source]
nextLight()[source]
nextMedium()[source]
reset()[source]

GPy.plotting.abstract_plotting_library module

class AbstractPlottingLibrary[source]

Bases: object

Set the defaults dictionary in the _defaults variable:

E.g. for matplotlib we define a file defaults.py and

set the dictionary of it here:

from . import defaults _defaults = defaults.__dict__
add_to_canvas(canvas, plots, legend=True, title=None, **kwargs)[source]

Add plots is a dictionary with the plots as the items or a list of plots as items to canvas.

The kwargs are plotting library specific kwargs!

E.g. in matplotlib this does not have to do anything to add stuff, but we set the legend and title.

!This function returns the updated canvas!

Parameters:
  • title – the title of the plot
  • legend – whether to plot a legend or not
annotation_heatmap(canvas, X, annotation, extent, label=None, **kwargs)[source]

Plot an annotation heatmap. That is like an imshow, but put the text of the annotation inside the cells of the heatmap (centered).

Parameters:
  • canvas – the canvas to plot on
  • annotation (array-like) – the annotation labels for the heatmap
  • extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
  • label (str) – the label for the heatmap
Returns:

a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)

annotation_heatmap_interact(canvas, plot_function, extent, label=None, resolution=15, **kwargs)[source]

if plot_function is not None, return an interactive updated heatmap, which updates on axis events, so that one can zoom in and out and the heatmap gets updated. See the matplotlib implementation in matplot_dep.controllers.

the plot_function returns a pair (X, annotation) to plot, when called with a new input X (which would be the grid, which is visible on the plot right now)

Parameters:
  • canvas – the canvas to plot on
  • annotation (array-like) – the annotation labels for the heatmap
  • extent ([horizontal_min,horizontal_max,vertical_min,vertical_max]) – the extent of where to place the heatmap
  • label (str) – the label for the heatmap
  • plot_function – the function, which generates new data for given input locations X
  • resolution (int) – the resolution of the interactive plot redraw - this is only needed when giving a plot_function
Returns:

a list of both the heatmap and annotation plots [heatmap, annotation], or the interactive update object (alone)

barplot(canvas, x, height, width=0.8, bottom=0, color=None, label=None, **kwargs)[source]

Plot vertical bar plot centered at x with height and width of bars. The y level is at bottom.

the kwargs are plotting library specific kwargs!

Parameters:
  • x (array-like) – the center points of the bars
  • height (array-like) – the height of the bars
  • width (array-like) – the width of the bars
  • bottom (array-like) – the start y level of the bars
  • kwargs – kwargs for the specific library you are using.
contour(canvas, X, Y, C, Z=None, color=None, label=None, **kwargs)[source]

Make a contour plot at (X, Y) with heights/colors stored in C on the canvas.

if Z is not None: make 3d contour plot at (X, Y, Z) with heights/colors stored in C on the canvas.

the kwargs are plotting library specific kwargs!

figure(nrows, ncols, **kwargs)[source]

Get a new figure with nrows and ncolumns subplots. Does not initialize the canvases yet.

There is individual kwargs for the individual plotting libraries to use.

fill_between(canvas, X, lower, upper, color=None, label=None, **kwargs)[source]

Fill along the xaxis between lower and upper.

the kwargs are plotting library specific kwargs!

fill_gradient(canvas, X, percentiles, color=None, label=None, **kwargs)[source]

Plot a gradient (in alpha values) for the given percentiles.

the kwargs are plotting library specific kwargs!

imshow(canvas, X, extent=None, label=None, vmin=None, vmax=None, **kwargs)[source]

Show the image stored in X on the canvas.

The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!

the kwargs are plotting library specific kwargs!

imshow_interact(canvas, plot_function, extent=None, label=None, vmin=None, vmax=None, **kwargs)[source]

This function is optional!

Create an imshow controller to stream the image returned by the plot_function. There is an imshow controller written for mmatplotlib, which updates the imshow on changes in axis.

The origin of the image show is (0,0), such that X[0,0] gets plotted at [0,0] of the image!

the kwargs are plotting library specific kwargs!

new_canvas(figure=None, col=1, row=1, projection='2d', xlabel=None, ylabel=None, zlabel=None, title=None, xlim=None, ylim=None, zlim=None, **kwargs)[source]

Return a canvas, kwargupdate for your plotting library.

if figure is not None, create a canvas in the figure at subplot position (col, row).

This method does two things, it creates an empty canvas and updates the kwargs (deletes the unnecessary kwargs) for further usage in normal plotting.

the kwargs are plotting library specific kwargs!

Parameters:projection ({'2d'|'3d'}) – The projection to use.

E.g. in matplotlib this means it deletes references to ax, as plotting is done on the axis itself and is not a kwarg.

Parameters:
  • xlabel – the label to put on the xaxis
  • ylabel – the label to put on the yaxis
  • zlabel – the label to put on the zaxis (if plotting in 3d)
  • title – the title of the plot
  • legend – if True, plot a legend, if int make legend rows in the legend
  • float) xlim ((float,) – the limits for the xaxis
  • float) ylim ((float,) – the limits for the yaxis
  • float) zlim ((float,) – the limits for the zaxis (if plotting in 3d)
plot(cavas, X, Y, Z=None, color=None, label=None, **kwargs)[source]

Make a line plot from for Y on X (Y = f(X)) on the canvas. If Z is not None, plot in 3d!

the kwargs are plotting library specific kwargs!

plot_axis_lines(ax, X, color=None, label=None, **kwargs)[source]

Plot lines at the bottom (lower boundary of yaxis) of the axis at input location X.

If X is two dimensional, plot in 3d and connect the axis lines to the bottom of the Z axis.

the kwargs are plotting library specific kwargs!

scatter(canvas, X, Y, Z=None, color=None, vmin=None, vmax=None, label=None, **kwargs)[source]

Make a scatter plot between X and Y on the canvas given.

the kwargs are plotting library specific kwargs!

Parameters:
  • canvas – the plotting librarys specific canvas to plot on.
  • X (array-like) – the inputs to plot.
  • Y (array-like) – the outputs to plot.
  • Z (array-like) – the Z level to plot (if plotting 3d).
  • c (array-like) – the colorlevel for each point.
  • vmin (float) – minimum colorscale
  • vmax (float) – maximum colorscale
  • kwargs – the specific kwargs for your plotting library
show_canvas(canvas, **kwargs)[source]

Draw/Plot the canvas given.

surface(canvas, X, Y, Z, color=None, label=None, **kwargs)[source]

Plot a surface for 3d plotting for the inputs (X, Y, Z).

the kwargs are plotting library specific kwargs!

xerrorbar(canvas, X, Y, error, color=None, label=None, **kwargs)[source]

Make an errorbar along the xaxis for points at (X,Y) on the canvas. if error is two dimensional, the lower error is error[:,0] and the upper error is error[:,1]

the kwargs are plotting library specific kwargs!

yerrorbar(canvas, X, Y, error, color=None, label=None, **kwargs)[source]

Make errorbars along the yaxis on the canvas given. if error is two dimensional, the lower error is error[0, :] and the upper error is error[1, :]

the kwargs are plotting library specific kwargs!

defaults

GPy.inference.optimization package

Submodules

GPy.inference.optimization.stochastics module

class SparseGPMissing(model, batchsize=1)[source]

Bases: GPy.inference.optimization.stochastics.StochasticStorage

Here we want to loop over all dimensions everytime. Thus, we can just make sure the loop goes over self.d every time. We will try to get batches which look the same together which speeds up calculations significantly.

class SparseGPStochastics(model, batchsize=1, missing_data=True)[source]

Bases: GPy.inference.optimization.stochastics.StochasticStorage

For the sparse gp we need to store the dimension we are in, and the indices corresponding to those

do_stochastics()[source]

Update the internal state to the next batch of the stochastic descent algorithm.

reset()[source]

Reset the state of this stochastics generator.

class StochasticStorage(model)[source]

Bases: object

This is a container for holding the stochastic parameters, such as subset indices or step length and so on.

self.d has to be a list of lists: [dimension indices, nan indices for those dimensions] so that the minibatches can be used as efficiently as possible.

Initialize this stochastic container using the given model

do_stochastics()[source]

Update the internal state to the next batch of the stochastic descent algorithm.

reset()[source]

Reset the state of this stochastics generator.

GPy.inference.latent_function_inference package

Introduction

Certain GPy.models can be instanciated with an inference_method. This submodule contains objects that can be assigned to inference_method.

Inference over Gaussian process latent functions

In all our GP models, the consistency property means that we have a Gaussian prior over a finite set of points f. This prior is:

\[N(f | 0, K)\]

where \(K\) is the kernel matrix.

We also have a likelihood (see GPy.likelihoods) which defines how the data are related to the latent function: \(p(y | f)\). If the likelihood is also a Gaussian, the inference over \(f\) is tractable (see GPy.inference.latent_function_inference.exact_gaussian_inference).

If the likelihood object is something other than Gaussian, then exact inference is not tractable. We then resort to a Laplace approximation (GPy.inference.latent_function_inference.laplace) or expectation propagation (GPy.inference.latent_function_inference.expectation_propagation).

The inference methods return a Posterior instance, which is a simple structure which contains a summary of the posterior. The model classes can then use this posterior object for making predictions, optimizing hyper-parameters, etc.

class InferenceMethodList[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference, list

on_optimization_end()[source]

This function gets called, just after the optimization loop ended.

on_optimization_start()[source]

This function gets called, just before the optimization loop to start.

class LatentFunctionInference[source]

Bases: object

static from_dict(input_dict)[source]

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:input_dict (dict) – Dictionary with all the information needed to instantiate the object.
on_optimization_end()[source]

This function gets called, just after the optimization loop ended.

on_optimization_start()[source]

This function gets called, just before the optimization loop to start.

to_dict()[source]

Submodules

GPy.inference.latent_function_inference.dtc module

class DTC[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

An object for inference when the likelihood is Gaussian, but we want to do sparse inference.

The function self.inference returns a Posterior object, which summarizes the posterior.

NB. It’s not recommended to use this function! It’s here for historical purposes.

inference(kern, X, Z, likelihood, Y, mean_function=None, Y_metadata=None)[source]
class vDTC[source]

Bases: object

inference(kern, X, Z, likelihood, Y, mean_function=None, Y_metadata=None)[source]

GPy.inference.latent_function_inference.exact_gaussian_inference module

class ExactGaussianInference[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

An object for inference when the likelihood is Gaussian.

The function self.inference returns a Posterior object, which summarizes the posterior.

For efficiency, we sometimes work with the cholesky of Y*Y.T. To save repeatedly recomputing this, we cache it.

LOO(kern, X, Y, likelihood, posterior, Y_metadata=None, K=None)[source]

Leave one out error as found in “Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models” Vehtari et al. 2014.

inference(kern, X, likelihood, Y, mean_function=None, Y_metadata=None, K=None, variance=None, Z_tilde=None)[source]

Returns a Posterior class containing essential quantities of the posterior

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object

GPy.inference.latent_function_inference.exact_studentt_inference module

class ExactStudentTInference[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

An object for inference of student-t processes (not for GP with student-t likelihood!).

The function self.inference returns a StudentTPosterior object, which summarizes the posterior.

inference(kern, X, Y, nu, mean_function=None, K=None)[source]

GPy.inference.latent_function_inference.expectation_propagation module

class EP(epsilon=1e-06, eta=1.0, delta=1.0, always_reset=False, max_iters=inf, ep_mode='alternated', parallel_updates=False, loading=False)[source]

Bases: GPy.inference.latent_function_inference.expectation_propagation.EPBase, GPy.inference.latent_function_inference.exact_gaussian_inference.ExactGaussianInference

The expectation-propagation algorithm. For nomenclature see Rasmussen & Williams 2006.

Parameters:
  • epsilon (float) – Convergence criterion, maximum squared difference allowed between mean updates to stop iterations (float)
  • eta (float64) – parameter for fractional EP updates.
  • delta (float64) – damping EP updates factor.
  • always_reset – setting to always reset the approximation at the beginning of every inference call.
Max_iters:

int

Ep_mode:

string. It can be “nested” (EP is run every time the Hyperparameters change) or “alternated” (It runs EP at the beginning and then optimize the Hyperparameters).

Parallel_updates:
 

boolean. If true, updates of the parameters of the sites in parallel

Loading:

boolean. If True, prevents the EP parameters to change. Hack used when loading a serialized model

expectation_propagation(mean_prior, K, Y, likelihood, Y_metadata)[source]
inference(kern, X, likelihood, Y, mean_function=None, Y_metadata=None, precision=None, K=None)[source]

Returns a Posterior class containing essential quantities of the posterior

to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class EPBase(epsilon=1e-06, eta=1.0, delta=1.0, always_reset=False, max_iters=inf, ep_mode='alternated', parallel_updates=False, loading=False)[source]

Bases: object

The expectation-propagation algorithm. For nomenclature see Rasmussen & Williams 2006.

Parameters:
  • epsilon (float) – Convergence criterion, maximum squared difference allowed between mean updates to stop iterations (float)
  • eta (float64) – parameter for fractional EP updates.
  • delta (float64) – damping EP updates factor.
  • always_reset – setting to always reset the approximation at the beginning of every inference call.
Max_iters:

int

Ep_mode:

string. It can be “nested” (EP is run every time the Hyperparameters change) or “alternated” (It runs EP at the beginning and then optimize the Hyperparameters).

Parallel_updates:
 

boolean. If true, updates of the parameters of the sites in parallel

Loading:

boolean. If True, prevents the EP parameters to change. Hack used when loading a serialized model

on_optimization_end()[source]
on_optimization_start()[source]
reset()[source]
class EPDTC(epsilon=1e-06, eta=1.0, delta=1.0, always_reset=False, max_iters=inf, ep_mode='alternated', parallel_updates=False, loading=False)[source]

Bases: GPy.inference.latent_function_inference.expectation_propagation.EPBase, GPy.inference.latent_function_inference.var_dtc.VarDTC

The expectation-propagation algorithm. For nomenclature see Rasmussen & Williams 2006.

Parameters:
  • epsilon (float) – Convergence criterion, maximum squared difference allowed between mean updates to stop iterations (float)
  • eta (float64) – parameter for fractional EP updates.
  • delta (float64) – damping EP updates factor.
  • always_reset – setting to always reset the approximation at the beginning of every inference call.
Max_iters:

int

Ep_mode:

string. It can be “nested” (EP is run every time the Hyperparameters change) or “alternated” (It runs EP at the beginning and then optimize the Hyperparameters).

Parallel_updates:
 

boolean. If true, updates of the parameters of the sites in parallel

Loading:

boolean. If True, prevents the EP parameters to change. Hack used when loading a serialized model

expectation_propagation(Kmm, Kmn, Y, likelihood, Y_metadata)[source]
inference(kern, X, Z, likelihood, Y, mean_function=None, Y_metadata=None, Lm=None, dL_dKmm=None, psi0=None, psi1=None, psi2=None)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class cavityParams(num_data)[source]

Bases: object

static from_dict(input_dict)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class gaussianApproximation(v, tau)[source]

Bases: object

static from_dict(input_dict)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class marginalMoments(num_data)[source]

Bases: object

class posteriorParams(mu, Sigma, L=None)[source]

Bases: GPy.inference.latent_function_inference.expectation_propagation.posteriorParamsBase

static from_dict(input_dict)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object
class posteriorParamsBase(mu, Sigma_diag)[source]

Bases: object

class posteriorParamsDTC(mu, Sigma_diag)[source]

Bases: GPy.inference.latent_function_inference.expectation_propagation.posteriorParamsBase

static from_dict(input_dict)[source]
to_dict()[source]

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:json serializable dictionary containing the needed information to instantiate the object

GPy.inference.latent_function_inference.fitc module

class FITC[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

An object for inference when the likelihood is Gaussian, but we want to do sparse inference.

The function self.inference returns a Posterior object, which summarizes the posterior.

inference(kern, X, Z, likelihood, Y, mean_function=None, Y_metadata=None)[source]
const_jitter = 1e-06

GPy.inference.latent_function_inference.gaussian_grid_inference module

class GaussianGridInference[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

An object for inference when the likelihood is Gaussian and inputs are on a grid.

The function self.inference returns a GridPosterior object, which summarizes the posterior.

inference(kern, X, likelihood, Y, Y_metadata=None)[source]

Returns a GridPosterior class containing essential quantities of the posterior

kron_mvprod(A, b)[source]

GPy.inference.latent_function_inference.grid_posterior module

class GridPosterior(alpha_kron=None, QTs=None, Qs=None, V_kron=None)[source]

Bases: object

Specially intended for the Grid Regression case An object to represent a Gaussian posterior over latent function values, p(f|D).

The purpose of this class is to serve as an interface between the inference schemes and the model classes.

alpha_kron : QTs : transpose of eigen vectors resulting from decomposition of single dimension covariance matrices Qs : eigen vectors resulting from decomposition of single dimension covariance matrices V_kron : kronecker product of eigenvalues reulting decomposition of single dimension covariance matrices

QTs

array of transposed eigenvectors resulting for single dimension covariance

Qs

array of eigenvectors resulting for single dimension covariance

V_kron

kronecker product of eigenvalues s

alpha

GPy.inference.latent_function_inference.inferenceX module

class InferenceX(model, Y, name='inferenceX', init='L2')[source]

Bases: GPy.core.model.Model

The model class for inference of new X with given new Y. (replacing the “do_test_latent” in Bayesian GPLVM) It is a tiny inference model created from the original GP model. The kernel, likelihood (only Gaussian is supported at the moment) and posterior distribution are taken from the original model. For Regression models and GPLVM, a point estimate of the latent variable X will be inferred. For Bayesian GPLVM, the variational posterior of X will be inferred. X is inferred through a gradient optimization of the inference model.

Parameters:
  • model (GPy.core.Model) – the GPy model used in inference
  • Y (numpy.ndarray) – the new observed data for inference
  • init ('L2', 'NCC' and 'rand') – the distance metric of Y for initializing X with the nearest neighbour.
compute_dL()[source]
log_likelihood()[source]
parameters_changed()[source]

This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

infer_newX(model, Y_new, optimize=True, init='L2')[source]

Infer the distribution of X for the new observed data Y_new.

Parameters:
  • model (GPy.core.Model) – the GPy model used in inference
  • Y_new (numpy.ndarray) – the new observed data for inference
  • optimize (boolean) – whether to optimize the location of new X (True by default)
Returns:

a tuple containing the estimated posterior distribution of X and the model that optimize X

Return type:

(GPy.core.parameterization.variational.VariationalPosterior, GPy.core.Model)

GPy.inference.latent_function_inference.laplace module

class Laplace[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

Laplace Approximation

Find the moments hat{f} and the hessian at this point (using Newton-Raphson) of the unnormalised posterior

LOO(kern, X, Y, likelihood, posterior, Y_metadata=None, K=None, f_hat=None, W=None, Ki_W_i=None)[source]

Leave one out log predictive density as found in “Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models” Vehtari et al. 2014.

inference(kern, X, likelihood, Y, mean_function=None, Y_metadata=None)[source]

Returns a Posterior class containing essential quantities of the posterior

mode_computations(f_hat, Ki_f, K, Y, likelihood, kern, Y_metadata)[source]

At the mode, compute the hessian and effective covariance matrix.

returns: logZ : approximation to the marginal likelihood
woodbury_inv : variable required for calculating the approximation to the covariance matrix dL_dthetaL : array of derivatives (1 x num_kernel_params) dL_dthetaL : array of derivatives (1 x num_likelihood_params)
rasm_mode(K, Y, likelihood, Ki_f_init, Y_metadata=None, *args, **kwargs)[source]

Rasmussen’s numerically stable mode finding For nomenclature see Rasmussen & Williams 2006 Influenced by GPML (BSD) code, all errors are our own

Parameters:
  • K (NxD matrix) – Covariance matrix evaluated at locations X
  • Y (np.ndarray) – The data
  • likelihood (a GPy.likelihood object) – the likelihood of the latent function value for the given data
  • Ki_f_init (np.ndarray) – the initial guess at the mode
  • Y_metadata (np.ndarray | None) – information about the data, e.g. which likelihood to take from a multi-likelihood object
Returns:

f_hat, mode on which to make laplace approxmiation

Return type:

np.ndarray

class LaplaceBlock[source]

Bases: GPy.inference.latent_function_inference.laplace.Laplace

Laplace Approximation

Find the moments hat{f} and the hessian at this point (using Newton-Raphson) of the unnormalised posterior

mode_computations(f_hat, Ki_f, K, Y, likelihood, kern, Y_metadata)[source]

At the mode, compute the hessian and effective covariance matrix.

returns: logZ : approximation to the marginal likelihood
woodbury_inv : variable required for calculating the approximation to the covariance matrix dL_dthetaL : array of derivatives (1 x num_kernel_params) dL_dthetaL : array of derivatives (1 x num_likelihood_params)
rasm_mode(K, Y, likelihood, Ki_f_init, Y_metadata=None, *args, **kwargs)[source]

Rasmussen’s numerically stable mode finding For nomenclature see Rasmussen & Williams 2006 Influenced by GPML (BSD) code, all errors are our own

Parameters:
  • K (NxD matrix) – Covariance matrix evaluated at locations X
  • Y (np.ndarray) – The data
  • likelihood (a GPy.likelihood object) – the likelihood of the latent function value for the given data
  • Ki_f_init (np.ndarray) – the initial guess at the mode
  • Y_metadata (np.ndarray | None) – information about the data, e.g. which likelihood to take from a multi-likelihood object
Returns:

f_hat, mode on which to make laplace approxmiation

Return type:

np.ndarray

warning_on_one_line(message, category, filename, lineno, file=None, line=None)[source]

GPy.inference.latent_function_inference.pep module

class PEP(alpha)[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

Sparse Gaussian processes using Power-Expectation Propagation for regression: alpha pprox 0 gives VarDTC and alpha = 1 gives FITC

Reference: A Unifying Framework for Sparse Gaussian Process Approximation using Power Expectation Propagation, https://arxiv.org/abs/1605.07066

inference(kern, X, Z, likelihood, Y, mean_function=None, Y_metadata=None)[source]
const_jitter = 1e-06

GPy.inference.latent_function_inference.posterior module

class Posterior(woodbury_chol=None, woodbury_vector=None, K=None, mean=None, cov=None, K_chol=None, woodbury_inv=None, prior_mean=0)[source]

Bases: object

An object to represent a Gaussian posterior over latent function values, p(f|D). This may be computed exactly for Gaussian likelihoods, or approximated for non-Gaussian likelihoods.

The purpose of this class is to serve as an interface between the inference schemes and the model classes. the model class can make predictions for the function at any new point x_* by integrating over this posterior.

woodbury_chol : a lower triangular matrix L that satisfies posterior_covariance = K - K L^{-T} L^{-1} K woodbury_vector : a matrix (or vector, as Nx1 matrix) M which satisfies posterior_mean = K M K : the proir covariance (required for lazy computation of various quantities) mean : the posterior mean cov : the posterior covariance

Not all of the above need to be supplied! You must supply:

K (for lazy computation) or K_chol (for lazy computation)

You may supply either:

woodbury_chol woodbury_vector

Or:

mean cov

Of course, you can supply more than that, but this class will lazily compute all other quantites on demand.

covariance_between_points(kern, X, X1, X2)[source]

Computes the posterior covariance between points.

Parameters:
  • kern – GP kernel
  • X – current input observations
  • X1 – some input observations
  • X2 – other input observations
K_chol

Cholesky of the prior covariance K

covariance

Posterior covariance $$ K_{xx} - K_{xx}W_{xx}^{-1}K_{xx} W_{xx} := exttt{Woodbury inv} $$

mean

Posterior mean $$ K_{xx}v v := exttt{Woodbury vector} $$

precision

Inverse of posterior covariance

woodbury_chol

return $L_{W}$ where L is the lower triangular Cholesky decomposition of the Woodbury matrix $$ L_{W}L_{W}^{ op} = W^{-1} W^{-1} := exttt{Woodbury inv} $$

woodbury_inv

The inverse of the woodbury matrix, in the gaussian likelihood case it is defined as $$ (K_{xx} + Sigma_{xx})^{-1} Sigma_{xx} := exttt{Likelihood.variance / Approximate likelihood covariance} $$

woodbury_vector

Woodbury vector in the gaussian likelihood case only is defined as $$ (K_{xx} + Sigma)^{-1}Y Sigma := exttt{Likelihood.variance / Approximate likelihood covariance} $$

class PosteriorEP(woodbury_chol=None, woodbury_vector=None, K=None, mean=None, cov=None, K_chol=None, woodbury_inv=None, prior_mean=0)[source]

Bases: GPy.inference.latent_function_inference.posterior.Posterior

woodbury_chol : a lower triangular matrix L that satisfies posterior_covariance = K - K L^{-T} L^{-1} K woodbury_vector : a matrix (or vector, as Nx1 matrix) M which satisfies posterior_mean = K M K : the proir covariance (required for lazy computation of various quantities) mean : the posterior mean cov : the posterior covariance

Not all of the above need to be supplied! You must supply:

K (for lazy computation) or K_chol (for lazy computation)

You may supply either:

woodbury_chol woodbury_vector

Or:

mean cov

Of course, you can supply more than that, but this class will lazily compute all other quantites on demand.

class PosteriorExact(woodbury_chol=None, woodbury_vector=None, K=None, mean=None, cov=None, K_chol=None, woodbury_inv=None, prior_mean=0)[source]

Bases: GPy.inference.latent_function_inference.posterior.Posterior

woodbury_chol : a lower triangular matrix L that satisfies posterior_covariance = K - K L^{-T} L^{-1} K woodbury_vector : a matrix (or vector, as Nx1 matrix) M which satisfies posterior_mean = K M K : the proir covariance (required for lazy computation of various quantities) mean : the posterior mean cov : the posterior covariance

Not all of the above need to be supplied! You must supply:

K (for lazy computation) or K_chol (for lazy computation)

You may supply either:

woodbury_chol woodbury_vector

Or:

mean cov

Of course, you can supply more than that, but this class will lazily compute all other quantites on demand.

class StudentTPosterior(deg_free, **kwargs)[source]

Bases: GPy.inference.latent_function_inference.posterior.PosteriorExact

GPy.inference.latent_function_inference.svgp module

class SVGP[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

inference(q_u_mean, q_u_chol, kern, X, Z, likelihood, Y, mean_function=None, Y_metadata=None, KL_scale=1.0, batch_scale=1.0)[source]

GPy.inference.latent_function_inference.var_dtc module

class VarDTC(limit=1)[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

An object for inference when the likelihood is Gaussian, but we want to do sparse inference.

The function self.inference returns a Posterior object, which summarizes the posterior.

For efficiency, we sometimes work with the cholesky of Y*Y.T. To save repeatedly recomputing this, we cache it.

get_VVTfactor(Y, prec)[source]
inference(kern, X, Z, likelihood, Y, Y_metadata=None, mean_function=None, precision=None, Lm=None, dL_dKmm=None, psi0=None, psi1=None, psi2=None, Z_tilde=None)[source]
set_limit(limit)[source]
const_jitter = 1e-08

GPy.inference.latent_function_inference.var_dtc_parallel module

class VarDTC_minibatch(batchsize=None, limit=3, mpi_comm=None)[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

An object for inference when the likelihood is Gaussian, but we want to do sparse inference.

The function self.inference returns a Posterior object, which summarizes the posterior.

For efficiency, we sometimes work with the cholesky of Y*Y.T. To save repeatedly recomputing this, we cache it.

gatherPsiStat(kern, X, Z, Y, beta, uncertain_inputs)[source]
inference_likelihood(kern, X, Z, likelihood, Y)[source]

The first phase of inference: Compute: log-likelihood, dL_dKmm

Cached intermediate results: Kmm, KmmInv,

inference_minibatch(kern, X, Z, likelihood, Y)[source]

The second phase of inference: Computing the derivatives over a minibatch of Y Compute: dL_dpsi0, dL_dpsi1, dL_dpsi2, dL_dthetaL return a flag showing whether it reached the end of Y (isEnd)

set_limit(limit)[source]
const_jitter = 1e-08
update_gradients(model, mpi_comm=None)[source]
update_gradients_sparsegp(model, mpi_comm=None)[source]

GPy.inference.latent_function_inference.var_gauss module

class VarGauss(alpha, beta)[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

The Variational Gaussian Approximation revisited

@article{Opper:2009,
title = {The Variational Gaussian Approximation Revisited}, author = {Opper, Manfred and Archambeau, C{‘e}dric}, journal = {Neural Comput.}, year = {2009}, pages = {786–792},

}

Parameters:
  • alpha – GPy.core.Param varational parameter
  • beta – GPy.core.Param varational parameter
inference(kern, X, likelihood, Y, mean_function=None, Y_metadata=None, Z=None)[source]

GPy.inference.latent_function_inference.vardtc_md module

class VarDTC_MD[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

The VarDTC inference method for sparse GP with missing data (GPy.models.SparseGPRegressionMD)

gatherPsiStat(kern, X, Z, Y, beta, uncertain_inputs)[source]
inference(kern, X, Z, likelihood, Y, indexD, output_dim, Y_metadata=None, Lm=None, dL_dKmm=None, Kuu_sigma=None)[source]

The first phase of inference: Compute: log-likelihood, dL_dKmm

Cached intermediate results: Kmm, KmmInv,

const_jitter = 1e-06

GPy.inference.latent_function_inference.vardtc_svi_multiout module

class PosteriorMultioutput(LcInvMLrInvT, LcInvScLcInvT, LrInvSrLrInvT, Lr, Lc, kern_r, Xr, Zr)[source]

Bases: object

class VarDTC_SVI_Multiout[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

The VarDTC inference method for Multi-output GP regression (GPy.models.GPMultioutRegression)

gatherPsiStat(kern, X, Z, uncertain_inputs)[source]
get_YYTfactor(Y)[source]
get_trYYT(Y)[source]
inference(kern_r, kern_c, Xr, Xc, Zr, Zc, likelihood, Y, qU_mean, qU_var_r, qU_var_c)[source]

The SVI-VarDTC inference

const_jitter = 1e-06

GPy.inference.latent_function_inference.vardtc_svi_multiout_miss module

class VarDTC_SVI_Multiout_Miss[source]

Bases: GPy.inference.latent_function_inference.LatentFunctionInference

The VarDTC inference method for Multi-output GP regression with missing data (GPy.models.GPMultioutRegressionMD)

gatherPsiStat(kern, X, Z, uncertain_inputs)[source]
get_YYTfactor(Y)[source]
get_trYYT(Y)[source]
inference(kern_r, kern_c, Xr, Xc, Zr, Zc, likelihood, Y, qU_mean, qU_var_r, qU_var_c, indexD, output_dim)[source]

The SVI-VarDTC inference

inference_d(d, beta, Y, indexD, grad_dict, mid_res, uncertain_inputs_r, uncertain_inputs_c, Mr, Mc)[source]
const_jitter = 1e-06

GPy.inference.mcmc package

Submodules

GPy.inference.mcmc.hmc module

class HMC(model, M=None, stepsize=0.1)[source]

Bases: object

An implementation of Hybrid Monte Carlo (HMC) for GPy models

Initialize an object for HMC sampling. Note that the status of the model (model parameters) will be changed during sampling.

Parameters:
  • model (GPy.core.Model) – the GPy model that will be sampled
  • M (numpy.ndarray) – the mass matrix (an identity matrix by default)
  • stepsize (float) – the step size for HMC sampling
sample(num_samples=1000, hmc_iters=20)[source]

Sample the (unfixed) model parameters.

Parameters:
  • num_samples (int) – the number of samples to draw (1000 by default)
  • hmc_iters (int) – the number of leap-frog iterations (20 by default)
Returns:

the list of parameters samples with the size N x P (N - the number of samples, P - the number of parameters to sample)

Return type:

numpy.ndarray

class HMC_shortcut(model, M=None, stepsize_range=[1e-06, 0.1], groupsize=5, Hstd_th=[1e-05, 3.0])[source]

Bases: object

sample(m_iters=1000, hmc_iters=20)[source]

GPy.inference.mcmc.samplers module

class Metropolis_Hastings(model, cov=None)[source]

Bases: object

Metropolis Hastings, with tunings according to Gelman et al.

new_chain(start=None)[source]
predict(function, args)[source]

Make a prediction for the function, to which we will pass the additional arguments

sample(Ntotal=10000, Nburn=1000, Nthin=10, tune=True, tune_throughout=False, tune_interval=400)[source]