Welcome to SALSA’s documentation!¶
SALSA: Software Lab for Advanced Machine Learning with Stochastic Algorithms is a native Julia implementation of stochastic algorithms for:
- linear and non-linear Support Vector Machines
- sparse linear modelling
SALSA is an open source project available at Github under the GPLv3 license.
Installation¶
The SALSA package can be installed from the Julia command line with Pkg.add("SALSA") or by running the same command directly with Julia executable by julia -e 'Pkg.add("SALSA")'.
Mathematical background¶
The SALSA package aims at stochastically learning a classifier or regressor via the Regularized Empirical Risk Minimization [Vapnik1992] framework. We approach a family of the well-known Machine Learning problems of the type:

where
is given as a pair of input-output variables and belongs to a set
of independent observations, the loss functions
measures the disagreement between the true target
and the model prediction
while the regularization term
penalizes the complexity of the model
. We draw uniformly
from
at most
times due of the i.i.d. assumption and a fixed computational budget. Online passes and optimization with the full dataset are available too. The package includes stochastic algorithms for linear and non-linear Support Vector Machines [Boser1992] and sparse linear modelling [Hastie2015].
Particular choices of loss functions are (but are not restricted to the selection below):
Particular choices of the regularization term are:
-regularization, i.e. 
-regularization, i.e. 
- reweighted
-regularization - reweighted
-regularization
References¶
SALSA is stemmed from the following algorithmic approaches:
- Pegasos: S. Shalev-Shwartz, Y. Singer, N. Srebro, Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, in: Proceedings of the 24th international conference on Machine learning, ICML ’07, New York, NY, USA, 2007, pp. 807–814.
- RDA: L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res. 11 (2010), pp. 2543–2596.
- Adaptive RDA: J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res. 12 (2011), pp. 2121–2159.
- Reweighted RDA: V. Jumutc, J.A.K. Suykens, Reweighted stochastic learning, Neurocomputing Special Issue - ISNN2014, 2015. (In Press)
Dependencies¶
- MLBase: to support generic Machine Learning routines
- StatsBase: to support generic routines from Statistics
- Distances: to support distance metrics between vectors
- Distributions: to support sampling from various distributions
- DataFrames: to support and process files instead of in-memory matrices
- Clustering: to support Stochastic K-means Clustering (experimental feature)
- ProgressMeter: to support progress bars and ETA of different routines
Indices and tables¶
| [Vapnik1992] | Vapnik, Vladimir. “Principles of risk minimization for learning theory”, In Advances in neural information processing systems (NIPS), pp. 831-838. 1992. |
| [Boser1992] | Boser, B., Guyon, I., Vapnik, V. “A training algorithm for optimal margin classifiers”, In Proceedings of the fifth annual workshop on Computational learning theory - COLT‘92., pp. 144-152, 1992. |
| [Hastie2015] | Hastie T., Tibshirani R., Wainwright M. Statistical Learning with Sparsity: The Lasso and Generalizations, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 2015. |
Data Preprocessing¶
This part of the package provides a simple set of preprocessing utilities.
Data Normalization¶
-
mapstd(X)¶ Normalizes each column of
Xto zero mean and one standard deviation. Output normalized matrixXwith extracted column-wise means and standard deviations.using SALSA mapstd([0 1; -1 2]) # --> ([0.707107 -0.707107; -0.707107 0.707107], [-0.5 1.5], [0.707107 0.707107])
-
mapstd(X, mean, std) Normalizes each column of
Ato the specified column-wisemeanandstd. Output normalized matrixX.using SALSA mapstd([0 1; -1 2], [-0.5 1.5], [0.707107 0.707107]) # --> [0.707107 -0.707107; -0.707107 0.707107]
Sparse Data Preparation¶
-
make_sparse(tuples[, sizes, delim])¶ Creates
SparseMatrixCSCobject from matrix of tuplesMatrix{ASCIIString}containingindex:valuepairs. The index and value pair can be separated bydelimcharacter, e.g.:. The user can optionally specify final dimensions of theSparseMatrixCSCobject assizestuple.Parameters: - tuples – matrix of tuples
Matrix{ASCIIString}containingindex:valuepairs - sizes – optional tuple of final dimensions, e.g.
(100000,10)(empty by default) - delim – optional character separating index and value pair in each cell of
tuples, default is ”:”
Returns: SparseMatrixCSCobject.- tuples – matrix of tuples
Data Management¶
-
DelimitedFile(name, header, delim)¶ Creates a wrapper around any delimited file which can be passed to low-level routines, for instance
pegasos_alg().DelimitedFilewill be processed in the online mode regardless of theonline_pass==0flag passed to low-level routines.Parameters: - name – file name
- header – flag indicating if a header is present
- delim – delimiting character
Classification¶
A classification example explained by the usage of SALSA package on the Ripley data set. The SALSA package provides many different options for stochastically learning a classification model.
This package provides a function salsa and explanation on SALSAModel which accompanies and complements it. The package provides full-stack functionality including cross-validation of all model- and algorithm-related hyperparameters.
Knowledge agnostic usage¶
-
salsa(X, Y[, Xtest])¶ Create a linear classification model with the predicted output
:
based on data given in
Xand labeling specified inY. Optionally evaluate it onXtest. Data should be given in row-wise format (one sample per row). The classification model is embedded into the returnedmodelasmodel.output. The choice of different algorithms, loss functions and modes will be explained further on this page.using SALSA, MAT, Base.Test srand(1234) ripley = matread(joinpath(Pkg.dir("SALSA"), "data", "ripley.mat")) model = salsa(ripley["X"], ripley["Y"], ripley["Xt"]) # --> SALSAModel(...) @test_approx_eq_eps mean(ripley["Yt"] .== model.output.Ytest) 0.89 0.01
-
salsa(mode, algorithm, loss, X, Y, Xtest) Create a classification model with the specified choice of algorithm, mode and loss function.
Parameters: - mode –
LINEARvs.NONLINEARmode specifies whether to use a simple linear classification model or to apply the Nyström method for approximating the feature map before proceeding with the learning scheme - algorithm – stochastic algorithm to learn a classification model, e.g.
PEGASOS,L1RDAetc. - loss – loss function to use when learning a classification model, e.g.
HINGE,LOGISTICetc. - X – training data (samples) represented by
MatrixorSparseMatrixCSC - Y – training labels
- Xtest – test data for out-of-sample evaluation
Returns: SALSAModelobject.using SALSA, MAT, Base.Test srand(1234) ripley = matread(joinpath(Pkg.dir("SALSA"), "data", "ripley.mat")) model = salsa(LINEAR, PEGASOS, HINGE, ripley["X"], ripley["Y"], ripley["Xt"]) @test_approx_eq_eps mean(ripley["Yt"] .== model.output.Ytest) 0.89 0.01
- mode –
Model-based usage¶
-
salsa(X, Y, model, Xtest) Create a classification model based on the provided model and input data
Parameters: - X – training data (samples) represented by
MatrixorSparseMatrixCSC - Y – training labels
- Xtest – test data for out-of-sample evaluation
- model – model is of type
SALSAModel{L <: Loss, A <: Algorithm, M <: Mode, K <: Kernel}and can be summarized as follows (with default values for named parameters):
mode::Type{M}: mode used to learn the model: LINEAR vs. NONLINEAR (mandatory parameter)algorithm::A: algorithm used to learn the model, e.g. PEGASOS (mandatory parameter)loss_function::Type{L}: type of a loss function used to learn the model, e.g. HINGE (mandatory parameter)kernel::Type{K} = RBFKernel: kernel used in NONLINEAR mode to compute Nyström approximationglobal_opt::GlobalOpt = CSA(): global optimization techniques for tuning hyperparameterssubset_size::Float64 = 5e-1: subset size used in NONLINEAR mode to compute Nyström approximationmax_cv_iter::Int = 1000: maximal number of iterations (budget) for any algorithm in training CVmax_iter::Int = 1000: maximal number of iterations (budget) for any algorithm for final trainingmax_cv_k::Int = 1: maximal number of data points used to compute loss derivative in training CVmax_k::Int = 1: maximal number of data points used to compute loss derivative for final trainingonline_pass::Int = 0: if > 0 we are in the online learning setting going through the entire datasetonline_passtimesnormalized::Bool = true: normalize data (extracting mean and std) before passing it to CV and final learningprocess_labels::Bool = true: process labels to comply with binary (-1 vs. 1) or multi-class classification encodingtolerance::Float64 = 1e-5: the criterion is evaluated for early stopping (online_pass==0)
sparsity_cv::Float64 = 2e-1: sparsity weight in the combined cross-validation/sparsity criterion used for the RDA type of algorithmsvalidation_criterion = MISCLASS(): validation criterion used to verify the generalization capabilities of the model in cross-validation
Returns: SALSAModelobject withmodel.outputof typeOutputModelstructured as follows:dfunc::Function: loss function derived from the type specified inloss_function::Type{L}(above)alg_params::Vector: vector of model- and algorithm-specific hyperparameters obtained via cross-validationX_mean::Matrix: row (vector) of extracted column-wise means of inputXifnormalized::Bool = trueX_std::Matrix: row (vector) of extracted column-wise standard deviations of inputXifnormalized::Bool = truemode::M: mode used to learn the model: LINEAR vs. NONLINEARw: found solution vector (matrix)b: found solution offset (bias)
- X – training data (samples) represented by
using SALSA, MAT, Base.Test
srand(1234)
ripley = matread(joinpath(Pkg.dir("SALSA"), "data", "ripley.mat"))
model = SALSAModel(NONLINEAR, R_L1RDA(), HINGE, global_opt=CSA())
model = salsa(ripley["X"], ripley["Y"], model, ripley["Xt"])
@test_approx_eq_eps mean(ripley["Yt"] .== model.output.Ytest) 0.895 0.01
Regression¶
A regression example is explained for the SALSA package by the sinc(x) = sin(x)./x function.
This package provides a function salsa and explanation on SALSAModel for the regression case. This use case is supported by the Fixed-Size approach [FS2010] and Nyström approximation with the specific LEAST_SQUARES() loss function and cross-validation criterion mse() (mean-squared error).
using SALSA, Base.Test
srand(1234)
sinc(x) = sin(x)./x
X = linspace(0.1,20,100)''
Xtest = linspace(0.11,19.9,100)''
y = sinc(X)
model = SALSAModel(NONLINEAR, SIMPLE_SGD(), LEAST_SQUARES,
validation_criterion=MSE(), process_labels=false)
model = salsa(X, y, model, Xtest)
@test_approx_eq_eps mse(sinc(Xtest), model.output.Ytest) 0.05 0.01
By taking a look at the code snippet above we can notice a major difference with the Classification example. The model is equipped with the NONLINEAR mode, LEAST_SQUARES loss function while the cross-validation criterion is given by MSE. Another important model-related parameter is process_labels which should be set to false in order to switch into regression mode. These four essential components unambiguously define a regression problem solved stochastically by the SALSA package.
| [FS2010] | De Brabanter K., De Brabanter J., Suykens J.A.K., De Moor B., “Optimized Fixed-Size Kernel Models for Large Data Sets”, Computational Statistics & Data Analysis, vol. 54, no. 6, Jun. 2010, pp. 1484-1504. |
Clustering¶
A clustering example is explained for the SALSA package on the Iris dataset [UCI2010].
This package provides a function salsa and explanation on SALSAModel for the clustering case. This use case is supported by the particular choices of loss functions and distance metrics applied within the Regularized K-Means approach [JS2015] and cross-validation criterion SILHOUETTE (Silhouette index).
using SALSA, Clustering, Distances, MLBase, Base.Test
Xf = readcsv(joinpath(Pkg.dir("SALSA"), "data", "iris.data.csv"))
Y = convert(Array{Int}, Xf[:,end])
k_clusters = length(unique(Y))
dY = Array{Int}(length(Y))
X = Xf[:,1:end-1]
srand(1234)
algorithm = RK_MEANS(k_clusters)
model = SALSAModel(LINEAR, algorithm, LEAST_SQUARES,
validation_criterion=SILHOUETTE(),
global_opt=DS([-1]), process_labels=false,
cv_gen = Nullable{CrossValGenerator}(Kfold(length(Y),3)))
model = salsa(X, dY, model, X)
mappings = model.output.Ytest
By taking a close look at the code snippet above we can notice that we use a special type of an algorithm RK_MEANS() which implements approach in [JS2015]. By instantiating RK_MEANS(k_clusters) we provide a maximum number of clusters to be extracted. Learning of individual prototype vectors will be repeated algorithm.max_iter times after re-partitioning of the dataset X (by default algorithm.max_iter==20). The default choice of the loss function is LEAST_SQUARES and the distance metric is Euclidean() [1]. This corresponds to the original setting of the unregularized K-Means approach. Please refer to Algorithms section and RK_MEANS() function for more details regarding which combinations of loss functions and metrics are supported.
| [UCI2010] | Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. |
| [JS2015] | (1, 2) Jumutc V., Suykens J.A.K., “Regularized and Sparse Stochastic K-Means for Distributed Large-Scale Clustering”, Internal Report 15-126, ESAT-SISTA, KU Leuven (Leuven, Belgium), 2015. |
Footnotes
| [1] | metric types are defined in Distances.jl package |
Loss Functions¶
This part of the package provides a description and mathematical background of the implemented loss functions. Every loss function can be supplied to salsa subroutines either directly (see salsa()) or passed within SALSAModel. In the definitions below
stands for the loss loss function evaluated at the true label
and a prediction
.
-
HINGE()¶ Defines an implementation of the Hinge Loss function, i.e.
.
-
LOGISTIC()¶ Defines an implementation of the Logistic Loss function, i.e.
.
-
LEAST_SQUARES()¶ Defines an implementation of the Least Squares Loss function, i.e.
.
-
SQUARED_HINGE()¶ Defines an implementation of the Squared Hinge Loss function, i.e.
.
-
PINBALL()¶ Defines an implementation of the Pinball (Quantile) Loss function, i.e.

If
PINBALLloss is selected
parameter will be tuned by the build-in cross-validation routines.
-
MODIFIED_HUBER()¶ Defines an implementation of the Modified Huber Loss function, i.e.

-
loss_derivative(type)¶ Defines a derivative of the loss function. One can pass any type of the loss function, e.g.
HINGEor an entire algorithm, for instanceRK_MEANS().Parameters: type – type of the loss function, e.g. HINGEor an entire algorithmReturns: Functionwhich calculates a derivative at the current iterate
, subsample
and label 
Algorithms¶
This part of the package provides a description, API and references to the implemented core algorithmic schemes (solvers) available in the SALSA package. Every algorithm can be supplied as a type to salsa subroutines either directly (see salsa()) or passed within SALSAModel. Please refer to Classification section for examples. Another available API is shipped with direct calls to algorithmic schemes. The latter is the most primitive and basic way of using SALSA package.
Available high-level API¶
-
PEGASOS()¶ Defines an implementation (see
pegasos_alg()) of the Pegasos: Primal Estimated sub-GrAdient SOlver for SVM which solves
-regularized problem defined here.
-
L1RDA()¶ Defines an implementation (see
l1rda_alg()) of the l1-Regularized Dual Averaging solver which solves
-regularized problem defined here.
-
ADA_L1RDA()¶ Defines an implementation (see
adaptive_l1rda_alg()) of the Adaptive l1-Regularized Dual Averaging solver which solves
-regularized problem defined here in an adaptive way [1].
-
R_L1RDA()¶ Defines an implementation (see
reweighted_l1rda_alg()) of the Reweighted l1-Regularized Dual Averaging solver which approximates
-regularized problem in a limit.
-
R_L2RDA()¶ Defines an implementation (see
reweighted_l2rda_alg()) of the Reweighted l2-Regularized Dual Averaging solver which approximates
-regularized problem in a limit.
-
SIMPLE_SGD()¶ Defines an implementation (see
sgd_alg()) of the unconstrained Stochastic Gradient Descent scheme which solves
-regularized problem defined here.
-
RK_MEANS(support_alg, k_clusters, max_iter, metric)¶ Defines an implementation (see
stochastic_rk_means()) of the Regularized Stochastic K-Means approach [JS2015]. Please refer to Clustering section for examples.Parameters: - support_alg – underlying support algorithm, e.g.
PEGASOS - k_clusters – number of clusters to be extracted
- max_iter – maximum number of outer iterations
- metric – metric to evaluate distances to centroids [2]
Selected
metricunambiguously define a loss function used to learn centroids. Currently supported metrics are:Euclidean()which is complemented byLEAST_SQUARES()loss functionCosineDist()which is complemented byHINGE()loss function
- support_alg – underlying support algorithm, e.g.
Available low-level API¶
-
pegasos_alg(dfunc, X, Y, λ, k, max_iter, tolerance[, online_pass=0, train_idx=[]])¶ Parameters: - dfunc – supplied loss function derivative (see
loss_derivative()) - X – training data (samples are stacked row-wise) represented by
Matrix,SparseMatrixCSCorDelimitedFile() - Y – training labels corresponding to
X - λ – trade-off hyperparameter
- k – sampling size at each iteration

- max_iter – maximum number of iterations (budget)
- tolerance – early stopping threshold, i.e.

- online_pass – number of online passes through data,
online_pass=0indicates a default stochastic mode instead of an online mode - train_idx – subset of indices from
Xused to learn a model (
)
Returns: 
- dfunc – supplied loss function derivative (see
-
sgd_alg(dfunc, X, Y, λ, k, max_iter, tolerance[, online_pass=0, train_idx=[]])¶ Parameters: - dfunc – supplied loss function derivative (see
loss_derivative()) - X – training data (samples are stacked row-wise) represented by
Matrix,SparseMatrixCSCorDelimitedFile() - Y – training labels corresponding to
X - λ – trade-off hyperparameter
- k – sampling size at each iteration

- max_iter – maximum number of iterations (budget)
- tolerance – early stopping threshold, i.e.

- online_pass – number of online passes through data,
online_pass=0indicates a default stochastic mode instead of an online mode - train_idx – subset of indices from
Xused to learn a model (
)
Returns: 
- dfunc – supplied loss function derivative (see
-
l1rda_alg(dfunc, X, Y, λ, γ, ρ, k, max_iter, tolerance[, online_pass=0, train_idx=[]])¶ Parameters: - dfunc – supplied loss function derivative (see
loss_derivative()) - X – training data (samples are stacked row-wise) represented by
Matrix,SparseMatrixCSCorDelimitedFile() - Y – training labels corresponding to
X - λ – trade-off hyperparameter
- γ – hyperparameter involved in elastic-net regularization
- ρ – hyperparameter involved in elastic-net regularization
- k – sampling size at each iteration

- max_iter – maximum number of iterations (budget)
- tolerance – early stopping threshold, i.e.

- online_pass – number of online passes through data,
online_pass=0indicates a default stochastic mode instead of an online mode - train_idx – subset of indices from
Xused to learn a model (
)
Returns: 
- dfunc – supplied loss function derivative (see
-
adaptive_l1rda_alg(dfunc, X, Y, λ, γ, ρ, k, max_iter, tolerance[, online_pass=0, train_idx=[]])¶ Parameters: - dfunc – supplied loss function derivative (see
loss_derivative()) - X – training data (samples are stacked row-wise) represented by
Matrix,SparseMatrixCSCorDelimitedFile() - Y – training labels corresponding to
X - λ – trade-off hyperparameter
- γ – hyperparameter involved in elastic-net regularization
- ρ – hyperparameter involved in elastic-net regularization
- k – sampling size at each iteration

- max_iter – maximum number of iterations (budget)
- tolerance – early stopping threshold, i.e.

- online_pass – number of online passes through data,
online_pass=0indicates a default stochastic mode instead of an online mode - train_idx – subset of indices from
Xused to learn a model (
)
Returns: 
- dfunc – supplied loss function derivative (see
-
reweighted_l1rda_alg(dfunc, X, Y, λ, γ, ρ, ɛ, max_iter, tolerance[, online_pass=0, train_idx=[]])¶ Parameters: - dfunc – supplied loss function derivative (see
loss_derivative()) - X – training data (samples are stacked row-wise) represented by
Matrix,SparseMatrixCSCorDelimitedFile() - Y – training labels corresponding to
X - λ – trade-off hyperparameter
- γ – hyperparameter involved in reweighted formulation of a regularization term
- ρ – hyperparameter involved in reweighted formulation of a regularization term
- ɛ – reweighting hyperparameter
- k – sampling size at each iteration

- max_iter – maximum number of iterations (budget)
- tolerance – early stopping threshold, i.e.

- online_pass – number of online passes through data,
online_pass=0indicates a default stochastic mode instead of an online mode - train_idx – subset of indices from
Xused to learn a model (
)
Returns: 
- dfunc – supplied loss function derivative (see
-
reweighted_l2rda_alg(dfunc, X, Y, λ, ɛ, varɛ, max_iter, tolerance[, online_pass=0, train_idx=[]])¶ Parameters: - dfunc – supplied loss function derivative (see
loss_derivative()) - X – training data (samples are stacked row-wise) represented by
Matrix,SparseMatrixCSCorDelimitedFile() - Y – training labels corresponding to
X - λ – trade-off hyperparameter
- ɛ – reweighting hyperparameter
- varɛ – sparsification hyperparameter
- k – sampling size at each iteration

- max_iter – maximum number of iterations (budget)
- tolerance – early stopping threshold, i.e.

- online_pass – number of online passes through data,
online_pass=0indicates a default stochastic mode instead of an online mode - train_idx – subset of indices from
Xused to learn a model (
)
Returns: 
- dfunc – supplied loss function derivative (see
-
stochastic_rk_means(X, rk_means, alg_params, max_iter, tolerance[, online_pass=0, train_idx=[]])¶ Parameters: - X – training data (samples are stacked row-wise) represented by
Matrix,SparseMatrixCSCorDelimitedFile() - rk_means – algorithm defined by
RK_MEANS() - alg_params – hyperparameter of the supporting algorithm in
rk_means.support_alg - k – sampling size at each iteration

- max_iter – maximum number of iterations (budget)
- tolerance – early stopping threshold, i.e.

- online_pass – number of online passes through data,
online_pass=0indicates a default stochastic mode instead of an online mode - train_idx – subset of indices from
Xused to learn a model (
)
Returns: 
- X – training data (samples are stacked row-wise) represented by
Footnotes
| [1] | adaptation is taken with respect to observed (sub)gradients of the loss function |
| [2] | metric types are defined in Distances.jl package |
Model Tuning¶
This part of the package provides a simple API for model-tuning routines.
-
gen_cross_validate(evalfun, n, model)¶ Perform in parallel a generic cross-validation (CV) routine defined in
evalfunby the splitting specified inmodel.cv_gen.Parameters: - evalfun – function to evaluate
- n – total number of data points (instances) to create
KfoldCV generator ifmodel.cv_genis undefined (null) - model –
SALSAModelwhich contains thecv_genfield of typeNullable{CrossValGenerator}[1] ormodel.output.cv_foldsfield containing predefined indices for each fold
Returns: an average of
evalfunevaluations.
-
misclass(y, yhat)¶ Calculate misclassification rate as
.
-
mse(y, yhat)¶ Calculate mean squared error as

Footnotes
| [1] | wrapper around the type defined in MLBase.jl package |
Nyström Approximation¶
While linear techniques operating in the primal (input) space are able to achieve good generalization capabilities in some specific application areas, one cannot in general approximate with the linear model more complex or highly nonlinear functions. We apply a Fixed-Size approach [FS2010] and Nyström approximation [WS2001] to approximate a kernel-induced feature map with some higher dimensional explicit and approximate feature vector.
We select prototype vectors (a small working sample of size
) and construct, for instance an RBF kernel matrix
with

By following the approach in [WS2001] an expression for the entries of the approximated feature map
, with
is given by

where
and
denote the i-th eigenvalue and the i-th eigenvector of
.
Available API¶
-
AFEm(Xs, kernel, X)¶ Performs Automatic Feature Extraction (AFE) by Nyström method [WS2001] using a subsample
. We restrict kernel <: Kernelto be a subclass ofKernel, for instanceRBFKernel.Parameters: - Xs – subset which is used to construct kernel matrix

- kernel – kernel function, e.g.
RBFKernel(), used to construct kernel matrix
- X – full dataset
Returns: new dataset
derived from stacking together feature maps for every 
- Xs – subset which is used to construct kernel matrix
-
entropy_subset(X, kernel, subset_size)¶ Performs maximization of the quadratic Rényi Entropy by the representative points selection from
Xwhich can be supplied toAFEmasXssubset.Parameters: - X – full dataset
- kernel – kernel function, e.g.
RBFKernel(), used to construct kernel matrix
over which we compute Rényi Entropy - subset_size – number of representative data points
Available Kernel Functions¶
-
LinearKernel()¶ Defines an implementation of the Linear Kernel, i.e.
.
-
PolynomialKernel()¶ Defines an implementation of the Polynomial Kernel, i.e.
.
-
RBFKernel()¶ Defines an implementation of the Radial Basis Function (RBF) Kernel, i.e.
.
| [WS2001] | (1, 2, 3) Williams C. and Seeger M., “Using the Nyström method to speed up kernel machines”, in Proceedings of the 14th Annual Conference on Neural Information Processing (NIPS), pp. 682-688, 2001. |
Examples & notebooks¶
Prerequisites¶
Please refer to Julia downloads page for installing Julia language and all dependencies. The instructions for installing the SALSA package can be found here. Some additional plotting and data management packages might be required to run examples below (like Gadfly, MAT or DataFrames). If you prefer Python-style notebooks please refer to the Project Jupyter and IJulia package for instructions. In this section we provide code snippets which can be easily copied into the Julia console or Jupyter notebook. Please find an explanation on examples and functional IJulia notebooks online.
Advanced Classification¶
This example provides a use-case for nonlinear classification using Nyström approximation and Area Under ROC Curve (with 100 thresholds) as a cross-validation criterion.
using SALSA, MAT
ripley = matread(joinpath(Pkg.dir("SALSA"), "data", "ripley.mat")); srand(123);
model = SALSAModel(NONLINEAR, PEGASOS(), LOGISTIC, validation_criterion=AUC(100));
model = salsa(ripley["X"], ripley["Y"], model, ripley["Xt"]);
range1 = linspace(-1.5,1.5,200);
range2 = linspace(-0.5,1.5,200);
grid = [[i j] for i in range1, j in range2];
Xgrid = foldl(vcat, grid);
Xtest = ripley["Xt"];
yhat = model.output.Ytest;
yplot = map_predict_latent(model,Xgrid);
yplot = yplot - minimum(yplot);
yplot = 2*(yplot ./ maximum(yplot)) - 1;
using DataFrames
df = DataFrame();
df[:X] = Xgrid[:,1][:];
df[:Y] = Xgrid[:,2][:];
df[:class] = yplot[:];
using Gadfly
set_default_plot_size(20cm, 20cm);
plot(layer(x=Xtest[yhat.>0,1], y=Xtest[yhat.>0,2], Geom.point, Theme(default_color=colorant"orange")),
layer(x=Xtest[yhat.<0,1], y=Xtest[yhat.<0,2], Geom.point, Theme(default_color=colorant"black")),
layer(df, x="X", y="Y", color="class", Geom.rectbin))
Advanced Regression¶
This example provides a use-case for regression using Nyström approximation and mse() (Mean Squared Error) as a criterion in the Leave-One-Out cross-validation defined in MLBase.jl package.
using SALSA, MLBase
sinc(x) = sin(x)./x;
X = linspace(0.1,20,100)'';
Xtest = linspace(0.1,20,200)'';
Y = sinc(X);
srand(1234);
model = SALSAModel(NONLINEAR, PEGASOS(), LEAST_SQUARES,
cv_gen=Nullable{CrossValGenerator}(LOOCV(100)),
validation_criterion=MSE(), process_labels=false, subset_size=5.0);
model = salsa(X, Y, model, Xtest);
using Gadfly
set_default_plot_size(20cm, 20cm);
plot(layer(x=Xtest[:], y=sinc(Xtest), Geom.point),
layer(x=Xtest[:], y=model.output.Ytest, Geom.line, Theme(default_color=colorant"orange")))