Welcome to MultivariateStats’s documentation¶
MultivariateStats.jl is a Julia package for multivariate statistical analysis. It provides a rich set of useful analysis techniques, such as PCA, CCA, LDA, PLS, etc.
Contents:
Linear Least Square and Ridge Regression¶
The package provides functions to perform Linear Least Square and Ridge Regression.
Linear Least Square¶
Linear Least Square is to find linear combination(s) of given variables to fit the responses by minimizing the squared error between them. This can be formulated as an optimization as follows:

Sometimes, the coefficient matrix is given in a transposed form, in which case, the optimization is modified as:

The package provides llsq to solve these problems:
-
llsq(X, y; ...)¶ Solve the linear least square problem formulated above.
Here,
ycan be either a vector, or a matrix where each column is a response vector.This function accepts two keyword arguments:
trans: whether to use the transposed form. (default isfalse)bias: whether to include the bias termb. (default istrue)
The function results the solution
a. In particular, whenyis a vector (matrix),ais also a vector (matrix). Ifbiasis true, then the returned array is augmented as[a; b].
Examples
For a single response vector y (without using bias):
using MultivariateStats
# prepare data
X = rand(1000, 3) # feature matrix
a0 = rand(3) # ground truths
y = X * a0 + 0.1 * randn(1000) # generate response
# solve using llsq
a = llsq(X, y; bias=false)
# do prediction
yp = X * a
# measure the error
rmse = sqrt(mean(abs2(y - yp)))
print("rmse = $rmse")
For a single response vector y (using bias):
# prepare data
X = rand(1000, 3)
a0, b0 = rand(3), rand()
y = X * a0 + b0 + 0.1 * randn(1000)
# solve using llsq
sol = llsq(X, y)
# extract results
a, b = sol[1:end-1], sol[end]
# do prediction
yp = X * a + b
For a matrix Y comprised of multiple columns:
# prepare data
X = rand(1000, 3)
A0, b0 = rand(3, 5), rand(1, 5)
Y = (X * A0 .+ b0) + 0.1 * randn(1000, 5)
# solve using llsq
sol = llsq(X, Y)
# extract results
A, b = sol[1:end-1,:], sol[end,:]
# do prediction
Yp = X * A .+ b
Ridge Regression¶
Compared to linear least square, Ridge Regression uses an additional quadratic term to regularize the problem:

The transposed form:

The package provides ridge to solve these problems:
-
ridge(X, y, r; ...)¶ Solve the ridge regression problem formulated above.
Here,
ycan be either a vector, or a matrix where each column is a response vector.The argument
rgives the quadratic regularization matrixQ, which can be in either of the following forms:ris a real scalar, thenQis considered to ber * eye(n), wherenis the dimension ofa.ris a real vector, thenQis considered to bediagm(r).ris a real symmetric matrix, thenQis simply considered to ber.
This function accepts two keyword arguments:
trans: whether to use the transposed form. (default isfalse)bias: whether to include the bias termb. (default istrue)
The function results the solution
a. In particular, whenyis a vector (matrix),ais also a vector (matrix). Ifbiasis true, then the returned array is augmented as[a; b].
Data Whitening¶
A whitening transformation is a decorrelation transformation that transforms a set of random variables into a set of new random variables with identity covariance (uncorrelated with unit variances).
In particular, suppose a random vector has covariance
, then a whitening transform
is one that satisfy:

Note that
is generally not unique. In particular, if
is a whitening transform, so is any of its rotation
with
.
Whitening¶
The package uses Whitening defined below to represent a whitening transform:
immutable Whitening{T<:FloatingPoint}
mean::Vector{T} # mean vector (can be empty to indicate zero mean), of length d
W::Matrix{T} # the transform coefficient matrix, of size (d, d)
end
An instance of Whitening can be constructed by Whitening(mean, W).
There are several functions to access the properties of a whitening transform f:
-
indim(f)¶ Get the input dimension, i.e
d.
-
outdim(f)¶ Get the out dimension, i.e
d.
-
mean(f)¶ Get the mean vector.
Note: if
f.meanis empty, this function returns a zero vector of lengthd.
-
transform(f, x)¶ Apply the whitening transform to a vector or a matrix with samples in columns, as
.
Data Analysis¶
Given a dataset, one can use the fit method to estimate a whitening transform.
-
fit(Whitening, X; ...)¶ Estimate a whitening transform from the data given in
X. Here,Xshould be a matrix, whose columns give the samples.This function returns an instance of
Whitening.Keyword Arguments:
name description default regcoef The regularization coefficient. The covariance will be regularized as follows when
regcoefis positive:C + (eigmax(C) * regcoef) * eye(d)zero(T)mean The mean vector, which can be either of:
0: the input data has already been centralizednothing: this function will compute the mean- a pre-computed mean vector
nothingNote: This function internally relies on
cov_whitento derive the transformationW. The functioncov_whitenitself is also a useful function.
-
cov_whitening(C)¶ Derive the whitening transform coefficient matrix
Wgiven the covariance matrixC. Here,Ccan be either a square matrix, or an instance ofCholesky.Internally, this function solves the whitening transform using Cholesky factorization. The rationale is as follows: let
and
, then
.Note: The return matrix
Wis an upper triangular matrix.
-
cov_whitening(C, regcoef) Derive a whitening transform based on a regularized covariance, as
C + (eigmax(C) * regcoef) * eye(d).
In addition, the package also provides cov_whiten!, in which the input matrix C will be overwritten during computation. This can be more efficient when C is no longer used.
-
invsqrtm(C)¶ Compute
inv(sqrtm(C))through symmetric eigenvalue decomposition.
Principal Component Analysis¶
Principal Component Analysis (PCA) derives an orthogonal projection to convert a given set of observations to linearly uncorrelated variables, called principal components.
This package defines a PCA type to represent a PCA model, and provides a set of methods to access the properties.
Properties¶
Let M be an instance of PCA, d be the dimension of observations, and p be the output dimension (i.e the dimension of the principal subspace)
-
indim(M)¶ Get the input dimension
d, i.e the dimension of the observation space.
-
outdim(M)¶ Get the output dimension
p, i.e the dimension of the principal subspace.
-
mean(M)¶ Get the mean vector (of length
d).
-
projection(M)¶ Get the projection matrix (of size
(d, p)). Each column of the projection matrix corresponds to a principal component.The principal components are arranged in descending order of the corresponding variances.
-
principalvars(M)¶ The variances of principal components.
-
tprincipalvar(M)¶ The total variance of principal components, which is equal to
sum(principalvars(M)).
-
tresidualvar(M)¶ The total residual variance.
-
tvar(M)¶ The total observation variance, which is equal to
tprincipalvar(M) + tresidualvar(M).
-
principalratio(M)¶ The ratio of variance preserved in the principal subspace, which is equal to
tprincipalvar(M) / tvar(M).
Transformation and Construction¶
Given a PCA model M, one can use it to transform observations into principal components, as

or use it to reconstruct (approximately) the observations from principal components, as

Here,
is the projection matrix.
The package provides methods to do so:
-
transform(M, x)¶ Transform observations
xinto principal components.Here,
xcan be either a vector of lengthdor a matrix where each column is an observation.
-
reconstruct(M, y)¶ Approximately reconstruct observations from the principal components given in
y.Here,
ycan be either a vector of lengthpor a matrix where each column gives the principal components for an observation.
Data Analysis¶
One can use the fit method to perform PCA over a given dataset.
-
fit(PCA, X; ...)¶ Perform PCA over the data given in a matrix
X. Each column ofXis an observation.This method returns an instance of
PCA.Keyword arguments:
Let
(d, n) = size(X)be respectively the input dimension and the number of observations:name description default method The choice of methods:
:auto: use:covwhend < nor:svdotherwise:cov: based on covariance matrix:svd: based on SVD of the input data
:automaxoutdim Maximum output dimension. min(d, n)pratio The ratio of variances preserved in the principal subspace. 0.99mean The mean vector, which can be either of:
0: the input data has already been centralizednothing: this function will compute the mean- a pre-computed mean vector
nothingNotes:
- The output dimension
pdepends on bothmaxoutdimandpratio, as follows. Suppose the firstkprincipal components preserve at leastpratioof the total variance, while the firstk-1preserves less thanpratio, then the actual output dimension will bemin(k, maxoutdim). - This function calls
pcacovorpcasvdinternally, depending on the choice of method.
Example:
using MultivariateStats
# suppose Xtr and Xte are training and testing data matrix,
# with each observation in a column
# train a PCA model
M = fit(PCA, Xtr; maxoutdim=100)
# apply PCA model to testing set
Yte = transform(M, Xte)
# reconstruct testing observations (approximately)
Xr = reconstruct(M, Yte)
Example with iris dataset and plotting:
using MultivariateStats, RDatasets, Plots
plotly() # using plotly for 3D-interacive graphing
# load iris dataset
iris = dataset("datasets", "iris")
# split half to training set
Xtr = convert(Array,DataArray(iris[1:2:end,1:4]))'
Xtr_labels = convert(Array,DataArray(iris[1:2:end,5]))
# split other half to testing set
Xte = convert(Array,DataArray(iris[2:2:end,1:4]))'
Xte_labels = convert(Array,DataArray(iris[2:2:end,5]))
# suppose Xtr and Xte are training and testing data matrix,
# with each observation in a column
# train a PCA model, allowing up to 3 dimensions
M = fit(PCA, Xtr; maxoutdim=3)
# apply PCA model to testing set
Yte = transform(M, Xte)
# reconstruct testing observations (approximately)
Xr = reconstruct(M, Yte)
# group results by testing set labels for color coding
setosa = Yte[:,Xte_labels.=="setosa"]
versicolor = Yte[:,Xte_labels.=="versicolor"]
virginica = Yte[:,Xte_labels.=="virginica"]
# visualize first 3 principal components in 3D interacive plot
p = scatter(setosa[1,:],setosa[2,:],setosa[3,:],marker=:circle,linewidth=0)
scatter!(versicolor[1,:],versicolor[2,:],versicolor[3,:],marker=:circle,linewidth=0)
scatter!(virginica[1,:],virginica[2,:],virginica[3,:],marker=:circle,linewidth=0)
plot!(p,xlabel="PC1",ylabel="PC2",zlabel="PC3")
Core Algorithms¶
Two algorithms are implemented in this package: pcacov and pcastd.
-
pcacov(C, mean; ...)¶ Compute PCA based on eigenvalue decomposition of a given covariance matrix
C.Parameters: - C – The covariance matrix.
- mean – The mean vector of original samples, which can be a vector of length
d, or an empty vectorFloat64[]indicating a zero mean.
Returns: The resultant PCA model.
Note: This function accepts two keyword arguments:
maxoutdimandpratio.
-
pcasvd(Z, mean, tw; ...)¶ Compute PCA based on singular value decomposition of a centralized sample matrix
Z.Parameters: - Z – provides centralized samples.
- mean – The mean vector of the original samples, which can be a vector of length
d, or an empty vectorFloat64[]indicating a zero mean.
Returns: The resultant PCA model.
Note: This function accepts two keyword arguments:
maxoutdimandpratio.
Probabilistic Principal Component Analysis¶
Probabilistic Principal Component Analysis (PPCA) represents a constrained form of the Gaussian distribution in which the number of free parameters can be restricted while still allowing the model to capture the dominant correlations in a data set. It is expressed as the maximum likelihood solution of a probabilistic latent variable model [BSHP06].
This package defines a PPCA type to represent a probabilistic PCA model, and provides a set of methods to access the properties.
Properties¶
Let M be an instance of PPCA, d be the dimension of observations, and p be the output dimension (i.e the dimension of the principal subspace)
-
indim(M)¶ Get the input dimension
d, i.e the dimension of the observation space.
-
outdim(M)¶ Get the output dimension
p, i.e the dimension of the principal subspace.
-
mean(M)¶ Get the mean vector (of length
d).
-
projection(M)¶ Get the projection matrix (of size
(d, p)). Each column of the projection matrix corresponds to a principal component.The principal components are arranged in descending order of the corresponding variances.
-
loadings(M)¶ The factor loadings matrix (of size
(d, p)).
-
var(M)¶ The total residual variance.
Transformation and Construction¶
Given a probabilistic PCA model M, one can use it to transform observations into latent variables, as

or use it to reconstruct (approximately) the observations from latent variables, as
![\tilde{\mathbf{x}} = \mathbf{W} \mathbb{E}[\mathbf{z}] + \boldsymbol{\mu}](_images/math/0f765e1371dc1193b39d268011d7d66cc77cf3ae.png)
Here,
is the factor loadings or weight matrix.
The package provides methods to do so:
-
transform(M, x)¶ Transform observations
xinto latent variables.Here,
xcan be either a vector of lengthdor a matrix where each column is an observation.
-
reconstruct(M, z)¶ Approximately reconstruct observations from the latent variable given in
z.Here,
ycan be either a vector of lengthpor a matrix where each column gives the latent variables for an observation.
Data Analysis¶
One can use the fit method to perform PCA over a given dataset.
-
fit(PPCA, X; ...)¶ Perform probabilistic PCA over the data given in a matrix
X. Each column ofXis an observation.This method returns an instance of
PCA.Keyword arguments:
Let
(d, n) = size(X)be respectively the input dimension and the number of observations:name description default method The choice of methods:
:ml: use maximum likelihood version of probabilistic PCA:em: use EM version of probabilistic PCA:bayes: use Bayesian PCA
:mlmaxoutdim Maximum output dimension. d-1mean The mean vector, which can be either of:
0: the input data has already been centralizednothing: this function will compute the mean- a pre-computed mean vector
nothingtol Convergence tolerance 1.0e-6tot Maximum number of iterations 1000Notes:
- This function calls
ppcaml,ppcaemorbayespcainternally, depending on the choice of method.
Example:
using MultivariateStats
# suppose Xtr and Xte are training and testing data matrix,
# with each observation in a column
# train a PCA model
M = fit(PPCA, Xtr; maxoutdim=100)
# apply PCA model to testing set
Yte = transform(M, Xte)
# reconstruct testing observations (approximately)
Xr = reconstruct(M, Yte)
Core Algorithms¶
Three algorithms are implemented in this package: ppcaml, ppcaem, and bayespca.
-
ppcaml(Z, mean, tw; ...)¶ Compute probabilistic PCA using on maximum likelihood formulation for a centralized sample matrix
Z.Parameters: - Z – provides centralized samples.
- mean – The mean vector of the original samples, which can be a vector of length
d, or an empty vectorFloat64[]indicating a zero mean.
Returns: The resultant PPCA model.
Note: This function accepts two keyword arguments:
maxoutdimandtol.
-
ppcaem(S, mean, n; ...)¶ Compute probabilistic PCA based on expectation-maximization algorithm for a given sample covariance matrix
S.Parameters: - S – The sample covariance matrix.
- mean – The mean vector of original samples, which can be a vector of length
d, or an empty vectorFloat64[]indicating a zero mean. - n – The number of observations.
Returns: The resultant PPCA model.
Note: This function accepts two keyword arguments:
maxoutdim,tol, andtot.
-
bayespca(S, mean, n; ...)¶ Compute probabilistic PCA based on Bayesian algorithm for a given sample covariance matrix
S.Parameters: - S – The sample covariance matrix.
- mean – The mean vector of original samples, which can be a vector of length
d, or an empty vectorFloat64[]indicating a zero mean. - n – The number of observations.
Returns: The resultant PPCA model.
Note: This function accepts two keyword arguments:
maxoutdim,tol, andtot.Additional notes:
- Function uses the
maxoutdimparameter as an upper boundary when it automatically determines the latent space dimensionality.
Kernel Principal Component Analysis¶
Kernel Principal Component Analysis (kernel PCA) is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space.
This package defines a KernelPCA type to represent a kernel PCA model, and provides a set of methods to access the properties.
Properties¶
Let M be an instance of KernelPCA, d be the dimension of observations, and p be the output dimension (i.e the dimension of the principal subspace)
-
indim(M)¶ Get the input dimension
d, i.e the dimension of the observation space.
-
outdim(M)¶ Get the output dimension
p, i.e the dimension of the principal subspace.
-
projection(M)¶ Get the projection matrix (of size
(n, p)). Each column of the projection matrix corresponds to an eigenvector, andnis a number of observations.The principal components are arranged in descending order of the corresponding eigenvalues.
-
principalvars(M)¶ The variances of principal components.
Transformation and Construction¶
The package provides methods to do so:
-
transform(M, x)¶ Transform observations
xinto principal components.Here,
xcan be either a vector of lengthdor a matrix where each column is an observation.
-
reconstruct(M, y)¶ Approximately reconstruct observations from the principal components given in
y.Here,
ycan be either a vector of lengthpor a matrix where each column gives the principal components for an observation.
Data Analysis¶
One can use the fit method to perform kernel PCA over a given dataset.
-
fit(KernelPCA, X; ...)¶ Perform kernel PCA over the data given in a matrix
X. Each column ofXis an observation.This method returns an instance of
KernelPCA.Keyword arguments:
Let
(d, n) = size(X)be respectively the input dimension and the number of observations:name description default kernel The kernel function:
This functions accepts two vector arguments
xandy, and returns a scalar value.(x,y)->x'ysolver The choice of solver:
:eig: useseigfact:eigs: useseigs(always used for sparse data)
:eigmaxoutdim Maximum output dimension. min(d, n)inverse Whether to perform calculation for inverse transform for non-precomputed kernels. falseβ Hyperparameter of the ridge regression that learns the inverse transform (when inverseistrue).1.0tol Convergence tolerance for eigssolver0.0maxiter Maximum number of iterations for eigssolver300
Kernels¶
List of the commonly used kernels:
function description (x,y)->x'yLinear (x,y)->(x'y+c)^dPolynomial (x,y)->exp(-γ*norm(x-y)^2.0)Radial basis function (RBF)
Example:
using MultivariateStats
# suppose Xtr and Xte are training and testing data matrix,
# with each observation in a column
# train a kernel PCA model
M = fit(KernelPCA, Xtr; maxoutdim=100, inverse=true)
# apply kernel PCA model to testing set
Yte = transform(M, Xte)
# reconstruct testing observations (approximately)
Xr = reconstruct(M, Yte)
Canonical Correlation Analysis¶
Canonical Correlation Analysis (CCA) is a statistical analysis technique to identify correlations between two sets of variables. Given two vector variables X and Y, it finds two projections, one for each, to transform them to a common space with maximum correlations.
The package defines a CCA type to represent a CCA model, and provides a set of methods to access the properties.
Properties¶
Let M be an instance of CCA, dx be the dimension of X, dy the dimension of Y, and p the output dimension (i.e the dimensio of the common space).
-
xindim(M)¶ Get the dimension of
X, the first set of variables.
-
yindim(M)¶ Get the dimension of
Y, the second set of variables.
-
outdim(M)¶ Get the output dimension, i.e that of the common space.
-
xmean(M)¶ Get the mean vector of
X(of lengthdx).
-
ymean(M)¶ Get the mean vector of
Y(of lengthdy).
-
xprojection(M)¶ Get the projection matrix for
X(of size(dx, p)).
-
yprojection(M)¶ Get the projection matrix for
Y(of size(dy, p)).
-
correlations(M)¶ The correlations of the projected componnents (a vector of length
p).
Transformation¶
Given a CCA model, one can transform observations into both spaces into a common space, as

Here,
and
are projection matrices for X and Y;
and
are mean vectors.
This package provides methods to do so:
-
xtransform(M, x)¶ Transform observations in the X-space to the common space.
Here,
xcan be either a vector of lengthdxor a matrix where each column is an observation.
-
ytransform(M, y)¶ Transform observations in the Y-space to the common space.
Here,
ycan be either a vector of lengthdyor a matrix where each column is an observation.
Data Analysis¶
One can use the fit method to perform CCA over given datasets.
-
fit(CCA, X, Y; ...)¶ Perform CCA over the data given in matrices
XandY. Each column ofXandYis an observation.XandYshould have the same number of columns (denoted bynbelow).This method returns an instance of
CCA.Keyword arguments:
name description default method The choice of methods:
:cov: based on covariance matrices:svd: based on SVD of the input data
:svdoutdim The output dimension, i.e dimension of the common space min(dx, dy, n)mean The mean vector, which can be either of:
0: the input data has already been centralizednothing: this function will compute the mean- a pre-computed mean vector
nothingNotes: This function calls
ccacovorccasvdinternally, depending on the choice of method.
Core Algorithms¶
Two algorithms are implemented in this package: ccacov and ccasvd.
-
ccacov(Cxx, Cyy, Cxy, xmean, ymean, p)¶ Compute CCA based on analysis of the given covariance matrices, using generalized eigenvalue decomposition.
Parameters: - Cxx – The covariance matrix of
X. - Cyy – The covariance matrix of
Y. - Cxy – The covariance matrix between
XandY. - xmean – The mean vector of the original samples of
X, which can be a vector of lengthdx, or an empty vectorFloat64[]indicating a zero mean. - ymean – The mean vector of the original samples of
Y, which can be a vector of lengthdy, or an empty vectorFloat64[]indicating a zero mean. - p – The output dimension, i.e the dimension of the common space.
Returns: The resultant CCA model.
- Cxx – The covariance matrix of
-
ccasvd(Zx, Zy, xmean, ymean, p)¶ Compute CCA based on singular value decomposition of centralized sample matrices
ZxandZy.Parameters: - Zx – The centralized sample matrix for
X. - Zy – The centralized sample matrix for
Y. - xmean – The mean vector of the original samples of
X, which can be a vector of lengthdx, or an empty vectorFloat64[]indicating a zero mean. - ymean – The mean vector of the original samples of
Y, which can be a vector of lengthdy, or an empty vectorFloat64[]indicating a zero mean. - p – The output dimension, i.e the dimension of the common space.
Returns: The resultant CCA model.
- Zx – The centralized sample matrix for
Classical Multidimensional Scaling¶
In general, Multidimensional Scaling (MDS) refers to techniques that transforms samples into lower dimensional space while preserving the inter-sample distances as well as possible.
Overview of Classical MDS¶
Classical MDS is a specific technique in this family that accomplishes the embedding in two steps:
Convert the distance matrix to a Gram matrix.
This conversion is based on the following relations between a distance matrix
Dand a Gram matrixG:
Here,
indicates the element-wise square of
, and
is the diagonal elements of
. This relation is
itself based on the following decomposition of squared Euclidean distance:
Perform eigenvalue decomposition of the Gram matrix to derive the coordinates.
Functions¶
This package provides functions related to classical MDS.
-
gram2dmat(G)¶ Convert a Gram matrix
Gto a distance matrix.
-
gram2dmat!(D, G) Convert a Gram matrix
Gto a distance matrix, and write the results toD.
-
dmat2gram(D)¶ Convert a distance matrix
Dto a Gram matrix.
-
dmat2gram!(G, D) Convert a distance matrix
Dto a Gram matrix, and write the results toG.
-
classical_mds(D, p[, dowarn=true])¶ Perform classical MDS. This function derives a
p-dimensional embedding based on a given distance matrixD.It returns a coordinate matrix of size
(p, n), where each column is the coordinates for an observation.Note
The Gramian derived from
Dmay have nonpositive or degenerate eigenvalues. The subspace of nonpositive eigenvalues is projected out of the MDS solution so that the strain function is minimized in a least-squares sense. If the smallest remaining eigenvalue that is used for the MDS is degenerate, then the solution is not unique, as any linear combination of degenerate eigenvectors will also yield a MDS solution with the same strain value. By default, warnings are emitted if either situation is detected, which can be suppressed withdowarn=false.If the MDS uses an eigenspace of dimension
mless thanp, then the MDS coordinates will be padded withp-mzeros each.Reference:
@inbook{Borg2005, Author = {Ingwer Borg and Patrick J. F. Groenen}, Title = {Modern Multidimensional Scaling: Theory and Applications}, Edition = {2}, Year = {2005}, Chapter = {12}, Doi = {10.1007/0-387-28981-X}, Pages = {201--268}, Series = {Springer Series in Statistics}, Publisher = {Springer}, }
Linear Discriminant Analysis¶
Linear Discriminant Analysis are statistical analysis methods to find a linear combination of features for separating observations in two classes.
Note: Please refer to Multi-class Linear Discriminant Analysis for methods that can discriminate between multiple classes.
Overview of LDA¶
Suppose the samples in the positive and negative classes respectively with means:
and
, and covariances
and
. Then based on Fisher’s Linear Discriminant Criteria, the optimal projection direction can be expressed as:

Here α is an arbitrary non-negative coefficient.
Linear Discriminant¶
A linear discriminant functional can be written as

Here, w is the coefficient vector, and b is the bias constant.
This package uses the LinearDiscriminant type, defined as below, to capture a linear discriminant functional:
immutable LinearDiscriminant <: Discriminant
w::Vector{Float64}
b::Float64
end
This type comes with several methods. Let f be an instance of LinearDiscriminant
-
length(f)¶ Get the length of the coefficient vector.
-
evaluate(f, x)¶ Evaluate the linear discriminant value, i.e
w'x + b.When
xis a vector, it returns a real value; whenxis a matrix with samples in columns, it returns a vector of lengthsize(x, 2).
-
predict(f, x)¶ Make prediction. It returns
trueiffevaluate(f, x)is positive.
Data Analysis¶
The package provides several functions to perform Linear Discriminant Analysis.
-
ldacov(Cp, Cn, μp, μn)¶ Performs LDA given covariances and mean vectors.
Parameters: - Cp – The covariance matrix of the positive class.
- Cn – The covariance matrix of the negative class.
- μp – The mean vector of the positive class.
- μn – The mean vector of the negative class.
Returns: The resultant linear discriminant functional of type
LinearDiscriminant.Note: The coefficient vector is scaled such that
w'μp + b = 1andw'μn + b = -1.
-
ldacov(C, μp, μn) Performs LDA given a covariance matrix and both mean vectors.
Parameters: - C – The pooled covariane matrix (i.e
(Cp + Cn)/2) - μp – The mean vector of the positive class.
- μn – The mean vector of the negative class.
Returns: The resultant linear discriminant functional of type
LinearDiscriminant.Note: The coefficient vector is scaled such that
w'μp + b = 1andw'μn + b = -1.- C – The pooled covariane matrix (i.e
-
fit(LinearDiscriminant, Xp, Xn)¶ Performs LDA given both positive and negative samples.
Parameters: - Xp – The sample matrix of the positive class.
- Xn – The sample matrix of the negative class.
Returns: The resultant linear discriminant functional of type
LinearDiscriminant.
Multi-class Linear Discriminant Analysis¶
Multi-class LDA is a generalization of standard two-class LDA that can handle arbitrary number of classes.
Overview¶
Multi-class LDA is based on the analysis of two scatter matrices: within-class scatter matrix and between-class scatter matrix.
Given a set of samples
, and their class labels
:
The within-class scatter matrix is defined as:

Here,
is the sample mean of the k-th class.
The between-class scatter matrix is defined as:

Here, m is the number of classes,
is the overall sample mean, and
is the number of samples in the k-th class.
Then, multi-class LDA can be formulated as an optimization problem to find a set of linear combinations (with coefficients
) that maximizes the ratio of the between-class scattering to the within-class scattering, as

The solution is given by the following generalized eigenvalue problem:
(1)
Generally, at most m - 1 generalized eigenvectors are useful to discriminate between m classes.
When the dimensionality is high, it may not be feasible to construct the scatter matrices explicitly. In such cases, see SubspaceLDA below.
Normalization by number of observations¶
An alternative definition of the within- and between-class scatter matrices normalizes for the number of observations in each group:

where

This definition can sometimes be more useful when looking for directions which discriminate among clusters containing widely-varying numbers of observations.
Multi-class LDA¶
The package defines a MulticlassLDA type to represent a multi-class LDA model, as:
type MulticlassLDA
proj::Matrix{Float64}
pmeans::Matrix{Float64}
stats::MulticlassLDAStats
end
Here, proj is the projection matrix, pmeans is the projected means of all classes, stats is an instance of MulticlassLDAStats that captures all statistics computed to train the model (which we will discuss later).
Several methods are provided to access properties of the LDA model. Let M be an instance of MulticlassLDA:
-
indim(M)¶ Get the input dimension (i.e the dimension of the observation space).
-
outdim(M)¶ Get the output dimension (i.e the dimension of the transformed features).
-
projection(M)¶ Get the projection matrix (of size
d x p).
-
mean(M)¶ Get the overall sample mean vector (of length
d).
-
classmeans(M)¶ Get the matrix comprised of class-specific means as columns (of size
(d, m)).
-
classweights(M)¶ Get the weights of individual classes (a vector of length
m). If the samples are not weighted, the weight equals the number of samples of each class.
-
withinclass_scatter(M)¶ Get the within-class scatter matrix (of size
(d, d)).
-
betweenclass_scatter(M)¶ Get the between-class scatter matrix (of size
(d, d)).
-
transform(M, x)¶ Transform input sample(s) in
xto the output space. Here,xcan be either a sample vector or a matrix comprised of samples in columns.In the pratice of classification, one can transform testing samples using this
transformmethod, and compare them withM.pmeans.
Data Analysis¶
One can use fit to perform multi-class LDA over a set of data:
-
fit(MulticlassLDA, nc, X, y; ...)¶ Perform multi-class LDA over a given data set.
Parameters: - nc – the number of classes
- X – the matrix of input samples, of size
(d, n). Each column inXis an observation. - y – the vector of class labels, of length
n. Each element ofymust be an integer between1andnc.
Returns: The resultant multi-class LDA model, of type
MulticlassLDA.Keyword arguments:
name description default method The choice of methods:
:gevd: based on generalized eigenvalue decomposition:whiten: first derive a whitening transform fromSwand then solve the problem based on eigenvalue decomposition of the whitenSb.
:gevdoutdim The output dimension, i.e dimension of the transformed space min(d, nc-1)regcoef The regularization coefficient. A positive value regcoef * eigmax(Sw)is added to the diagonal ofSwto improve numerical stability.1.0e-6Note: The resultant projection matrix
Psatisfies:
Here,
equals regcoef * eigmax(Sw). The columns ofPare arranged in descending order of the corresponding generalized eigenvalues.Note that
MulticlassLDAdoes not currently support the normalized version using
and
(see SubspaceLDAbelow).
Task Functions¶
The multi-class LDA consists of several steps:
- Compute statistics, such as class means, scatter matrices, etc.
- Solve the projection matrix.
- Construct the model.
Sometimes, it is useful to only perform one of these tasks. The package exposes several functions for this purpose:
-
multiclass_lda_stats(nc, X, y)¶ Compute statistics required to train a multi-class LDA.
Parameters: - nc – the number of classes
- X – the matrix of input samples.
- y – the vector of class labels.
This function returns an instance of
MulticlassLDAStats, defined as below, that captures all relevant statistics.type MulticlassLDAStats dim::Int # sample dimensions nclasses::Int # number of classes cweights::Vector{Float64} # class weights tweight::Float64 # total sample weight mean::Vector{Float64} # overall sample mean cmeans::Matrix{Float64} # class-specific means Sw::Matrix{Float64} # within-class scatter matrix Sb::Matrix{Float64} # between-class scatter matrix end
This type has the following constructor. Under certain circumstances, one might collect statistics in other ways and want to directly construct this instance.
-
MulticlassLDAStats(cweights, mean, cmeans, Sw, Sb)¶ Construct an instance of type
MulticlassLDAStats.Parameters: - cweights – the class weights, a vector of length
m. - mean – the overall sample mean, a vector of length
d. - cmeans – the class-specific sample means, a matrix of size
(d, m). - Sw – the within-class scatter matrix, a matrix of size
(d, d). - Sb – the between-class scatter matrix, a matrix of size
(d, d).
- cweights – the class weights, a vector of length
-
multiclass_lda(S; ...)¶ Perform multi-class LDA based on given statistics. Here
Sis an instance ofMulticlassLDAStats.This function accepts the following keyword arguments (as above):
method,outdim, andregcoef.
-
mclda_solve(Sb, Sw, method, p, regcoef)¶ Solve the projection matrix given both scatter matrices.
Parameters: - Sb – the between-class scatter matrix.
- Sw – the within-class scatter matrix.
- method – the choice of method, which can be either
:gevdor:whiten. - p – output dimension.
- regcoef – regularization coefficient.
-
mclda_solve!(Sb, Sw, method, p, regcoef) Solve the projection matrix given both scatter matrices.
Note: In this function,
SbandSwwill be overwritten (saving some space).
Subspace LDA¶
The package also defines a SubspaceLDA type to represent a
multi-class LDA model for high-dimensional spaces. MulticlassLDA,
because it stores the scatter matrices, is not well-suited for
high-dimensional data. For example, if you are performing LDA on
images, and each image has 10^6 pixels, then the scatter matrices
would contain 10^12 elements, far too many to store
directly. SubspaceLDA calculates the projection direction without
the intermediary of the scatter matrices, by focusing on the subspace
that lies within the span of the within-class scatter. This also
serves to regularize the computation.
immutable SubspaceLDA{T<:AbstractFloat}
projw::Matrix{T} # P, project down to the subspace spanned by within-class scatter
projLDA::Matrix{T} # L, LDA directions in the projected subspace
λ::Vector{T}
cmeans::Matrix{T}
cweights::Vector{Int}
end
This supports all the same methods as MulticlassLDA, with the
exception of the functions that return a scatter matrix. The overall
projection is represented as a factorization P*L, where P'*x
projects data points to the subspace spanned by the within-class
scatter, and L is the LDA projection in the subspace. The
projection directions w (the columns of projection(M)) satisfy
the equation

When P is of full rank (e.g., if there are more data points than
dimensions), then this equation guarantees that
Eq. (1) will also hold.
SubspaceLDA also supports the normalized version of LDA via the normalize keyword:
M = fit(SubspaceLDA, X, label; normalize=true)
would perform LDA using the equivalent of
and
.
Independent Component Analysis¶
Independent Component Analysis (ICA) is a computational technique for separating a multivariate signal into additive subcomponents, with the assumption that the subcomponents are non-Gaussian and independent from each other.
There are multiple algorithms for ICA. Currently, this package implements the Fast ICA algorithm.
ICA¶
The package uses a type ICA, defined below, to represent an ICA model:
mutable struct ICA{T<:Real}
mean::Vector{T} # mean vector, of length m (or empty to indicate zero mean)
W::Matrix{T} # component coefficient matrix, of size (m, k)
end
Note: Each column of W here corresponds to an independent component.
Several methods are provided to work with ICA. Let M be an instance of ICA:
-
indim(M)¶ Get the input dimension, i.e the number of observed mixtures.
-
outdim(M)¶ Get the output dimension, i.e the number of independent components.
-
mean(M)¶ Get the mean vector.
Note: if
M.meanis empty, this function returns a zero vector of lengthindim(M).
-
transform(M, x)¶ Transform
xto the output space to extract independent components, as
.
Data Analysis¶
One can use fit to perform ICA over a given data set.
-
fit(ICA, X, k; ...)¶ Perform ICA over the data set given in
X.Parameters: - X – The data matrix, of size
(m, n). Each row corresponds to a mixed signal, while each column corresponds to an observation (e.g all signal value at a particular time step). - k – The number of independent components to recover.
Returns: The resultant ICA model, an instance of type
ICA.Note: If
do_whitenistrue, the returnWsatisfies
, otherwise Wis orthonormal, i.e
Keyword Arguments:
name description default alg The choice of algorithm (must be :fastica):fasticafun The approx neg-entropy functor. It can be obtained using the function
icagfun. Now, it accepts the following values:icagfun(:tanh)icagfun(:tanh, a)icagfun(:gaus)
icagfun(:tanh)do_whiten Whether to perform pre-whitening truemaxiter Maximum number of iterations 100tol Tolerable change of Wat convergence1.0e-6mean The mean vector, which can be either of:
0: the input data has already been centralizednothing: this function will compute the mean- a pre-computed mean vector
nothingwinit Initial guess of
W, which should be either of:- empty matrix: the function will perform random initialization
- a matrix of size
(k, k)(whendo_whiten) - a matrix of size
(m, k)(when!do_whiten)
zeros(0,0)verbose Whether to display iteration information false- X – The data matrix, of size
Core Algorithms¶
The package also exports functions of the core algorithms. Sometimes, it can be more efficient to directly invoke them instead of going through the fit interface.
-
fastica!(W, X, fun, maxiter, tol, verbose) Invoke the Fast ICA algorithm.
Parameters: - W – The initial un-mixing matrix, of size
(m, k). The function updates this matrix inplace. - X – The data matrix, of size
(m, n). This matrix is input only, and won’t be modified. - fun – The approximate neg-entropy functor, which can be obtained using
icagfun(see above). - maxiter – Maximum number of iterations.
- tol – Tolerable change of
Wat convergence. - verbose – Whether to display iteration information.
Returns: The updated
W.Note: The number of components is inferred from
Wassize(W, 2).- W – The initial un-mixing matrix, of size
Factor Analysis¶
Factor Analysis (FA) is a linear-Gaussian latent variable model that is closely related to probabilistic PCA. In contrast to the probabilistic PCA model, the covariance of conditional distribution of the observed variable given the latent variable is diagonal rather than isotropic [BSHP06].
This package defines a FactorAnalysis type to represent a factor analysis model, and provides a set of methods to access the properties.
Properties¶
Let M be an instance of FactorAnalysis, d be the dimension of observations, and p be the output dimension (i.e the dimension of the principal subspace)
-
indim(M)¶ Get the input dimension
d, i.e the dimension of the observation space.
-
outdim(M)¶ Get the output dimension
p, i.e the dimension of the principal subspace.
-
mean(M)¶ Get the mean vector (of length
d).
-
projection(M)¶ Get the projection matrix (of size
(d, p)). Each column of the projection matrix corresponds to a principal component.The principal components are arranged in descending order of the corresponding variances.
-
loadings(M)¶ The factor loadings matrix (of size
(d, p)).
-
cov(M)¶ The diagonal covariance matrix.
Transformation and Construction¶
Given a probabilistic PCA model M, one can use it to transform observations into latent variables, as

or use it to reconstruct (approximately) the observations from latent variables, as

Here,
is the factor loadings or weight matrix,
is the covariance matrix.
The package provides methods to do so:
-
transform(M, x)¶ Transform observations
xinto latent variables.Here,
xcan be either a vector of lengthdor a matrix where each column is an observation.
-
reconstruct(M, z)¶ Approximately reconstruct observations from the latent variable given in
z.Here,
ycan be either a vector of lengthpor a matrix where each column gives the latent variables for an observation.
Data Analysis¶
One can use the fit method to perform factor analysis over a given dataset.
-
fit(FactorAnalysis, X; ...)¶ Perform factor analysis over the data given in a matrix
X. Each column ofXis an observation.This method returns an instance of
FactorAnalysis.Keyword arguments:
Let
(d, n) = size(X)be respectively the input dimension and the number of observations:name description default method The choice of methods:
:em: use EM version of factor analysis:cm: use CM version of factor analysis
:cmmaxoutdim Maximum output dimension d-1mean The mean vector, which can be either of:
0: the input data has already been centralizednothing: this function will compute the mean- a pre-computed mean vector
nothingtol Convergence tolerance 1.0e-6tot Maximum number of iterations 1000η Variance low bound 1.0e-6Notes:
- This function calls
facmorfaeminternally, depending on the choice of method.
Example:
using MultivariateStats
# suppose Xtr and Xte are training and testing data matrix,
# with each observation in a column
# train a FactorAnalysis model
M = fit(FactorAnalysis, Xtr; maxoutdim=100)
# apply FactorAnalysis model to testing set
Yte = transform(M, Xte)
# reconstruct testing observations (approximately)
Xr = reconstruct(M, Yte)
Core Algorithms¶
Two algorithms are implemented in this package: faem and facm.
-
faem(S, mean, n; ...)¶ Perform factor analysis using an expectation-maximization algorithm for a given sample covariance matrix
S[RUBN82].Parameters: - S – The sample covariance matrix.
- mean – The mean vector of original samples, which can be a vector of length
d, or an empty vectorFloat64[]indicating a zero mean. - n – The number of observations.
Returns: The resultant FactorAnalysis model.
Note: This function accepts two keyword arguments:
maxoutdim,``tol``, andtot.
-
facm(S, mean, n; ...)¶ Perform factor analysis using an fast conditional maximization algorithm for a given sample covariance matrix
S[ZHAO08].Parameters: - S – The sample covariance matrix.
- mean – The mean vector of original samples, which can be a vector of length
d, or an empty vectorFloat64[]indicating a zero mean. - n – The number of observations.
Returns: The resultant FactorAnalysis model.
Note: This function accepts two keyword arguments:
maxoutdim,tol,tot, andη.
Notes:
All methods implemented in this package adopt the column-major convention of JuliaStats: in a data matrix, each column corresponds to a sample/observation, while each row corresponds to a feature (variable or attribute).