Welcome to Orange3-Recommendation!

Orange3-Recommendation is a Python library that extends Orange3 to include support for recommender systems. The code is open source, and available on github.

Installation

Orange3-Recommendation has a couple of prerequisites that need to be installed first, but once met, the rest of picky requirements are automatically handle by the installer.

Prerequisites

Python3 + pip

Orange3-Recommendation currently requires Python3 to run. (Note: The algorithms have been design using Numpy, Scipy and Scikit-learn. Therefore, the algorithms could work with Python 2.7. But due to dependencies related with Orange3 and its data.Tables, Python3 must be used)

Numpy, Scikit-learn and Orange3

The required dependencies to build the software are Numpy >= 1.9.0, Scikit-Learn >= 0.16 and Orange3.

This is automatically handled by the installer. So you don’t need to install anything else.

Install

This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:

python setup.py install –user

To install for all users on Unix/Linux:

python setup.py build sudo python setup.py install

For development mode use:

python setup.py develop

Widget usage

After the installation, the widgets from this add-on are registered with Orange. To run Orange from the terminal use:

python3 -m Orange.canvas

new widgets are in the toolbox bar under Recommendation section.

Tutorial

For a more visual tutorial go to Biolab’s blog

Input data

This section describes how to load the data in Orange3-Recommendation.

Data format

Orange can read files in native tab-delimited format, or can load data from any of the major standard spreadsheet file type, like CSV and Excel. Native format starts with a header row with feature (column) names. Second header row gives the attribute type, which can be continuous, discrete, string or time. The third header line contains meta information to identify dependent features (class), irrelevant features (ignore) or meta features (meta). Here are the first few lines from a data set ratings.tab:

tid      user        movie       score
string   discrete    discrete    continuous
meta     row=1       col=1       class
1        Breza       HarrySally  2
2        Dana        Cvetje      5
3        Cene        Prometheus  5
4        Ksenija     HarrySally  4
5        Albert      Matrix      4
...

The third row is mandatory in this kind of datasets, in order to know which attributes correspond to the users (row=1) and which ones to the items (col=1). For the case of big datasets, users and items must be specified as a continuous attributes due to efficiency issues. Here are the first few lines from a data set MovieLens100K.tab:

user            movie         score         tid
continuous      continuous    continuous    time
row=1           col=1         class         meta
196             242           3             881250949
186             302           3             891717742
22              377           1             878887116
244             51            2             880606923
166             346           1             886397596
298             474           4             884182806
...

Loading data

Datasets can be loaded as follow:

import Orange
data = Orange.data.Table("ratings.tab")

In the add-on, several toy datasets are included: ratings.tab, movielens100k.tab, binary_data.tab, epinions_train.tab, epinions_test.tab,... and a few more.

Getting started

Rating pairs (user, item)

Let’s presume that we want to load a dataset, train it and predict its first three pairs of (id_user, id_item)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 import Orange
 from orangecontrib.recommendation import BRISMFLearner

 # Load data and train the model
 data = Orange.data.Table('movielens100k.tab')
 learner = BRISMFLearner(num_factors=15, num_iter=25, learning_rate=0.07, lmbda=0.1)
 recommender = learner(data)

 # Make predictions
 prediction = recommender(data[:3])
 print(prediction)
 >>>
 [ 3.79505151  3.75096513  1.293013 ]

The first three lines of code, import the Orange module, the BRISMF factorization model and loads the MovieLens100K dataset. In the next lines we instantiate the model (learner = BRISMFLearner(...)) and we fit the model with the loaded data.

Finally, we predict the ratings for the first three pairs (user, item) in the loaded dataset.

Recommend items for set of users

Now we want to get all the predictions (all items) for a set of users:

1
2
3
4
5
6
7
8
import numpy as np
indices_users = np.array([4, 12, 36])
prediction = recommender.predict_items(indices_users)
print(prediction)
>>>
[[ 1.34743879  4.61513578  3.90757263 ...,  3.03535099  4.08221699 4.26139511]
 [ 1.16652757  4.5516808   3.9867497  ...,  2.94690548  3.67274108 4.1868596 ]
 [ 2.74395768  4.04859096  4.04553826 ...,  3.22923456  3.69682699 4.95043435]]

This time, we’ve fill an array with the indices of the users to which make the predictions for all the items.

If we want as an output just the first k elements (do not confuse with top best items), we have to add the parameter top=INTEGER to the function

prediction = recommender.predict_items(indices_users, top=2)
print(prediction)
>>>
[[ 1.34743879  4.61513578]
 [ 1.16652757  4.5516808]
 [ 2.74395768  4.04859096]]

Evaluation

Finally, we want to known which of a list of recommender performs better on our dataset. Therefore, we perform cross-validation over a list of learners.

The first thing we need to do is to make a list of all the learners that we want to cross-validate.

from orangecontrib.recommendation import GlobalAvgLearner,
                                             ItemAvgLearner,
                                             UserAvgLearner,
                                             UserItemBaselineLearner
global_avg = GlobalAvgLearner()
items_avg = ItemAvgLearner()
users_avg = UserAvgLearner()
useritem_baseline = UserItemBaselineLearner()
brismf = BRISMFLearner(num_factors=15, num_iter=25, learning_rate=0.07, lmbda=0.1)
learners = [global_avg, items_avg, users_avg, useritem_baseline, brismf]

Once, we have the list of learners and the data loaded, we score the methods. For the case, we have scored the recommendation two measures for goodnes of fit, which they’re later printed. To measure the error of the scoring, you can use all the functions defined in Orange.evaluation.

res = Orange.evaluation.CrossValidation(data, learners, k=5)
rmse = Orange.evaluation.RMSE(res)
r2 = Orange.evaluation.R2(res)

print("Learner  RMSE  R2")
for i in range(len(learners)):
    print("{:8s} {:.2f} {:5.2f}".format(learners[i].name, rmse[i], r2[i]))
>>>
Learner                   RMSE  R2
  - Global average        1.13 -0.00
  - Item average          1.03  0.16
  - User average          1.04  0.14
  - User-Item Baseline    0.98  0.25
  - BRISMF                0.96  0.28

Frequently Asked Questions

Do I need to know to program?

Not at all. This library can be installed in Orange3 in such a way that you only need to drag and drop widgets to build your pipeline.

_images/example_latent_factor_models.png

Why is there no widget for the ranking models?

Short answer: Currently Orange3 does not support ranking.

Long answer: This problem is related with how Orange3 works internally. For a given sample X, it expects to return a single value Y. The reason behind this is related with “safety”, as most of the regression and classification models return just one single value.

In ranking problems, multiple results are returned. Therefore, Orange3 treats the output as the output of a classification, returning the maximun value in the sequence.

Is the library prepared for big data?

Not really. From its very beginnings we were focused on building something easy to use, mostly oriented towards educational purposes and research.

This doesn’t mean that you cannot run big datasets. For instance, you can train BRISMF with the Netflix dataset in 30-40min. But if you plan to do so, we recommend you to use other alternatives highly optimized for those purposes.

Why are the algorithms not implemented in C/C++?

I refer back to the answer above. We want to speed-up the code as much as we can but keeping its readability and flexibility at its maximun levels, as well as having the less possible amount of dependecies.

Therefore, in order to achieve so, we try to cache as much accessings and operations as we can (keeping in mind the spacial cost), and also we try to vectorized everything we can.

Why don’t you use Cython or Numba?

As it is been said before, readability and flexibility are a top priority. Cython is not as simple to read as Numpy vectorized operations and Numba can present problems with dependencies in some computers.

Can I contribute to the library?

Yes, please! Indeed, if you don’t want, you don’t have to worry neither about the widgets nor the documentation if you don’t want. The only requirement is you add a new model is that it passes through all the tests.

Fork and contribute all as you want! https://github.com/biolab/orange3-recommendation

Baselines

_images/user-item-baseline.svg

This widget includes four basic baseline models: Global average, User average, Item average and User-Item baseline.

Signals

Inputs:

  • Data

    Data set.

  • Preprocessor

    Preprocessed data.

Outputs:

  • Learner

    The selected learner in the widget.

  • Predictor

    Trained recommender. Signal Predictor sends the output signal only if input Data is present.

Description

  • Global average:

    Computes the average of all ratings and use it to make predictions.

  • User average:

    Takes the average rating value of a user to make predictions.

  • Item average:

    Takes the average rating value of an item to make predictions.

  • User-Item baseline:

    Takes the bias of users and items plus the global average to make predictions.

Example

Below is a simple workflow showing how to use both the Predictor and the Learner output. For the Predictor we input the prediction model into Predictions widget and view the results in Data Table. For Learner we can compare different learners in Test&Score widget.

_images/example_baselines.png

BRISMF

_images/brismf.svg

Matrix factorization with explicit ratings, learning is performed by stochastic gradient descent.

Signals

Inputs:

  • Data

    Data set.

  • Preprocessor

    Preprocessed data.

Outputs:

  • Learner

    The learning algorithm with the supplied parameters

  • Predictor

    Trained recommender. Signal Predictor sends the output signal only if input Data is present.

  • P

    Latent features of the users

  • Q

    Latent features of the items

Description

BRISMF widget uses a biased regularized algorithm to factorize a matrix into two low rank matrices as it’s explained in Y. Koren, R. Bell, C. Volinsky, Matrix Factorization Techniques for Recommender Systems. IEE Computer Society, 2009.

Example

Below is a simple workflow showing how to use both the Predictor and the Learner output. For the Predictor we input the prediction model into Predictions widget and view the results in Data Table. For Learner we can compare different learners in Test&Score widget.

_images/example_latent_factor_models1.png

SVD++

_images/svdplusplus.svg

Matrix factorization model which makes use of implicit feedback information.

Signals

Inputs:

  • Data

    Data set.

  • Preprocessor

    Preprocessed data.

  • Feedback information

    Implicit feedback information. Optional, if None (default), it will be inferred from the ratings.

Outputs:

  • Learner

    The learning algorithm with the supplied parameters.

  • Predictor

    Trained recommender. Signal Predictor sends the output signal only if input Data is present.

  • P

    Latent features of the users.

  • Q

    Latent features of the items.

  • Y

    Latent features of the implicit information.

Description

SVD++ widget uses a biased regularized algorithm which makes use of implicit feedback information to factorize a matrix into three low rank matrices as it’s explained in Y. Koren, Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model

Example

Below is a simple workflow showing how to use both the Predictor and the Learner output. For the Predictor we input the prediction model into Predictions widget and view the results in Data Table. For Learner we can compare different learners in Test&Score widget.

_images/example_latent_factor_models1.png

TrustSVD

_images/trustsvd.svg

Trust-based matrix factorization, which extends SVD++ with trust information.

Signals

Inputs:

  • Data

    Data set.

  • Preprocessor

    Preprocessed data.

  • Trust information

    Trust information. The weights of the connections can be integer or float (binary relations can represented by 0 or 1).

Outputs:

  • Learner

    The learning algorithm with the supplied parameters.

  • Predictor

    Trained recommender. Signal Predictor sends the output signal only if input Data is present.

  • P

    Latent features of the users.

  • Q

    Latent features of the items.

  • Y

    Latent features of the implicit information.

  • W

    Latent features of the trust information.

Description

TrustSVD widget uses a biased regularized algorithm which makes use of implicit feedback information and trust information to factorize a matrix into four low rank matrices as it’s explained in Guibing Guo, Jie Zhang, Neil Yorke-Smith, TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings

Example

Below is a simple workflow showing how to use both the Predictor and the Learner output. For the Predictor we input the prediction model into Predictions widget and view the results in Data Table. For Learner we can compare different learners in Test&Score widget.

_images/example_latent_factor_models1.png

Baselines (recommendation)

Global Average

Global Average uses the average rating value of all ratings to make predictions.

\[\hat { r }_{ ui } = \mu\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
 import Orange
 from orangecontrib.recommendation import GlobalAvgLearner

 # Load data and train the model
 data = Orange.data.Table('movielens100k.tab')
 learner = GlobalAvgLearner()
 recommender = learner(data)

 prediction = recommender(data[:3])
 print(prediction)
 >>>
 [ 3.52986  3.52986  3.52986]
class orangecontrib.recommendation.GlobalAvgLearner(preprocessors=None, verbose=False)[source]

Global Average

This model takes the average rating value of all ratings to make predictions.

Attributes:
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

User Average

User Average uses the average rating value of a user to make predictions.

\[\hat { r }_{ ui } = \mu_{u}\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import Orange
from orangecontrib.recommendation import UserAvgLearner

# Load data and train the model
data = Orange.data.Table('movielens100k.tab')
learner = UserAvgLearner()
recommender = learner(data)

# Make predictions
prediction = recommender(data[:3])
print(prediction)
>>>
[ 3.61538462  3.41304348  3.3515625 ]
class orangecontrib.recommendation.UserAvgLearner(preprocessors=None, verbose=False)[source]

User average

This model takes the average rating value of a user to make predictions.

Attributes:
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

Item Average

Item Average uses the average rating value of an item to make predictions.

\[\hat { r }_{ ui } = \mu_{i}\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import Orange
from orangecontrib.recommendation import ItemAvgLearner

# Load data and train the model
data = Orange.data.Table('movielens100k.tab')
learner = ItemAvgLearner()
recommender = learner(data)

# Make predictions
prediction = recommender(data[:3])
print(prediction)
>>>
[ 3.99145299  4.16161616  2.15384615]
class orangecontrib.recommendation.ItemAvgLearner(preprocessors=None, verbose=False)[source]

Item average

This model takes the average rating value of an item to make predictions.

Attributes:
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

User-Item Baseline

User-Item Baseline takes the bias of users and items plus the global average to make predictions.

\[\hat { r }_{ ui } = \mu + b_{u} + b_{i}\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 import Orange
 from orangecontrib.recommendation import UserItemBaselineLearner

 # Load data and train the model
 data = Orange.data.Table('movielens100k.tab')
 learner = UserItemBaselineLearner()
 recommender = learner(data)

 # Make predictions
 prediction = recommender(data[:3])
 print(prediction)
 >>>
 [ 4.07697761  4.04479964  1.97554865]
class orangecontrib.recommendation.UserItemBaselineLearner(preprocessors=None, verbose=False)[source]

User-Item baseline

This model takes the bias of users and items plus the global average to make predictions.

Attributes:
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

Rating (recommendation)

BRISMF

BRISMF (Biased Regularized Incremental Simultaneous Matrix Factorization) is factorization-based algorithm for large scale recommendation systems.

The basic idea is to factorize a very sparse matrix into two low-rank matrices which represents user and item factors. This can be done by using an iterative approach to minimize the loss function.

User’s predictions are defined as follows:

\[\hat { r }_{ ui }=\mu +b_{ u }+b_{ i }+{ q }_{ i }^{ T }{ p }_{ u }\]

We learn the values of involved parameters by minimizing the regularized squared error function associated with:

\[\min _{ p*,q*,b* }{ \sum _{ (u,i\in k) }^{ }{ { ({ r }_{ ui }-\mu -b_{ u }-b_{ i }-{ q }_{ i }^{ T }{ p }_{ u }) }^{ 2 }+\lambda ({ { { b }_{ u } }^{ 2 }+{ { b }_{ i } }^{ 2 }+\left\| { p }_{ u } \right\| }^{ 2 }+{ \left\| q_{ i } \right\| }^{ 2 }) } }\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import Orange
from orangecontrib.recommendation import BRISMFLearner

# Load data and train the model
data = Orange.data.Table('movielens100k.tab')
learner = BRISMFLearner(num_factors=15, num_iter=25, learning_rate=0.07, lmbda=0.1)
recommender = learner(data)

# Make predictions
prediction = recommender(data[:3])
print(prediction)
>>>
[ 3.79505151  3.75096513  1.293013 ]
class orangecontrib.recommendation.BRISMFLearner(num_factors=5, num_iter=25, learning_rate=0.07, bias_learning_rate=None, lmbda=0.1, bias_lmbda=None, min_rating=None, max_rating=None, optimizer=None, preprocessors=None, verbose=False, random_state=None, callback=None)[source]

BRISMF: Biased Regularized Incremental Simultaneous Matrix Factorization

This model uses stochastic gradient descent to find two low-rank matrices: user-feature matrix and item-feature matrix.

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
bias_learning_rate: float, optional
The learning rate controlling the size of the bias update steps. If None (default), bias_learning_rate = learning_rate
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
bias_lmbda: float, optional
Controls the importance of the bias regularization term. If None (default), bias_lmbda = lmbda
min_rating: float, optional
Defines the lower bound for the predictions. If None (default), ratings won’t be bounded.
max_rating: float, optional
Defines the upper bound for the predictions. If None (default), ratings won’t be bounded.
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.
callback: callable
Method that receives the current iteration as an argument.
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

SVD++

SVD++ is matrix factorization model which makes use of implicit feedback information.

User’s predictions are defined as follows:

\[\hat { r }_{ ui } = \mu + b_u + b_i + \left(p_u + \frac{1}{\sqrt{|N(u)|}}\sum_{j\in N(u)} y_j \right)^T q_i\]

We learn the values of involved parameters by minimizing the regularized squared error function associated with:

\[\begin{split}\begin{split} \min _{ p*,q*,y*,b* }&{\sum _{ (u,i\in k) }{ { ({ r }_{ ui }-\mu -b_{ u }-b_{ i }-{ q }_{ i }^{ T }\left( p_{ u }+\frac { 1 }{ \sqrt { |N(u)| } } \sum _{ j\in N(u) } y_{ j } \right) ) }^{ 2 }}} \\ &+\lambda ({ { { b }_{ u } }^{ 2 }+{ { b }_{ i } }^{ 2 }+\left\| { p }_{ u } \right\| }^{ 2 }+{ \left\| q_{ i } \right\| }^{ 2 }+\sum _{ j\in N(u) }{ \left\| y_{ j } \right\| } ^{ 2 }) \end{split}\end{split}\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import Orange
from orangecontrib.recommendation import SVDPlusPlusLearner

# Load data and train the model
data = Orange.data.Table('movielens100k.tab')
learner = SVDPlusPlusLearner(num_factors=15, num_iter=25, learning_rate=0.07, lmbda=0.1)
recommender = learner(data)

# Make predictions
prediction = recommender(data[:3])
print(prediction)
class orangecontrib.recommendation.SVDPlusPlusLearner(num_factors=5, num_iter=25, learning_rate=0.01, bias_learning_rate=None, lmbda=0.1, bias_lmbda=None, min_rating=None, max_rating=None, feedback=None, optimizer=None, preprocessors=None, verbose=False, random_state=None, callback=None)[source]

SVD++ matrix factorization

This model uses stochastic gradient descent to find three low-rank matrices: user-feature matrix, item-feature matrix and feedback-feature matrix.

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
bias_learning_rate: float, optional
The learning rate controlling the size of the bias update steps. If None (default), bias_learning_rate = learning_rate
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
bias_lmbda: float, optional
Controls the importance of the bias regularization term. If None (default), bias_lmbda = lmbda
min_rating: float, optional
Defines the lower bound for the predictions. If None (default), ratings won’t be bounded.
max_rating: float, optional
Defines the upper bound for the predictions. If None (default), ratings won’t be bounded.
feedback: Orange.data.Table
Implicit feedback information. If None (default), implicit information will be inferred from the ratings (e.g.: item rated, means items seen).
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.
callback: callable
Method that receives the current iteration as an argument.
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

TrustSVD

TrustSVD is a trust-based matrix factorization, which extends SVD++ with trust information.

User’s predictions are defined as follows:

\[\hat { r }_{ ui }=\mu +b_{ u }+b_{ i }+{ q_{ i } }^{ \top }\left( p_{ u }+{ \left| { I }_{ u } \right| }^{ -\frac { 1 }{ 2 } }\sum _{ i\in { I }_{ u } } y_{ i }+{ \left| { T }_{ u } \right| }^{ -\frac { 1 }{ 2 } }\sum _{ v\in { T }_{ u } } w_{ v } \right)\]

We learn the values of involved parameters by minimizing the regularized squared error function associated with:

\[\begin{split}\begin{split} \mathcal{L} &=\frac { 1 }{ 2 } \sum _{ u }{ \sum _{ j\in { I }_{ u } }{ { ({ \hat { r } }_{ u,j } -{ r }_{ u,j }) }^{ 2 } } } + \frac { { \lambda }_{ t } }{ 2 } \sum _{ u }{ \sum _{ v\in { T }_{ u } }{ { ( { \hat { t } }_{ u,v } -{ t }_{ u,v }) }^{ 2 } } } \\ &+\frac { { \lambda } }{ 2 } \sum _{ u }^{ }{ { \left| { I }_{ u } \right| }^{ -\frac { 1 }{ 2 } }{ b }_{ u }^{ 2 } } +\frac { { \lambda } }{ 2 } \sum _{ j }{ { \left| { U }_{ j } \right| }^{ -\frac { 1 }{ 2 } }{ b }_{ j }^{ 2 } } \\ &+\sum _{ u }^{ }{ (\frac { { \lambda } }{ 2 } { \left| { I }_{ u } \right| }^{ -\frac { 1 }{ 2 } }+\frac { { \lambda }_{ t } }{ 2 } { \left| { T }_{ u } \right| }^{ -\frac { 1 }{ 2 } }{ )\left\| { p }_{ u } \right\| }_{ F }^{ 2 } } \\ &+\frac { { \lambda } }{ 2 } \sum _{ j }{ { \left| { U }_{ j } \right| }^{ -\frac { 1 }{ 2 } }{ \left\| { q }_{ j } \right\| }_{ F }^{ 2 } } +\frac { { \lambda } }{ 2 } \sum _{ i }{ { \left| { U }_{ i } \right| }^{ -\frac { 1 }{ 2 } }{ \left\| { y }_{ i } \right\| }_{ F }^{ 2 } } \\ &+\frac { { \lambda } }{ 2 } { \left| { T }_{ v }^{ + } \right| }^{ -\frac { 1 }{ 2 } }{ \left\| { w }_{ v } \right\| }_{ F }^{ 2 } \end{split}\end{split}\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 import Orange
 from orangecontrib.recommendation import TrustSVDLearner

 # Load data and train the model
 ratings = Orange.data.Table('filmtrust/ratings.tab')
 trust = Orange.data.Table('filmtrust/trust.tab')
 learner = TrustSVDLearner(num_factors=15, num_iter=25, learning_rate=0.07,
                           lmbda=0.1, social_lmbda=0.05, trust=trust)
 recommender = learner(data)

 # Make predictions
 prediction = recommender(data[:3])
 print(prediction)
class orangecontrib.recommendation.TrustSVDLearner(num_factors=5, num_iter=25, learning_rate=0.07, bias_learning_rate=None, lmbda=0.1, bias_lmbda=None, social_lmbda=0.05, min_rating=None, max_rating=None, trust=None, optimizer=None, preprocessors=None, verbose=False, random_state=None, callback=None)[source]

Trust-based matrix factorization

This model uses stochastic gradient descent to find four low-rank matrices: user-feature matrix, item-feature matrix, feedback-feature matrix and trustee-feature matrix.

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
bias_learning_rate: float, optional
The learning rate controlling the size of the bias update steps. If None (default), bias_learning_rate = learning_rate
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
bias_lmbda: float, optional
Controls the importance of the bias regularization term. If None (default), bias_lmbda = lmbda
social_lmbda: float, optional
Controls the importance of the trust regularization term.
min_rating: float, optional
Defines the lower bound for the predictions. If None (default), ratings won’t be bounded.
max_rating: float, optional
Defines the upper bound for the predictions. If None (default), ratings won’t be bounded.
feedback: Orange.data.Table
Implicit feedback information. If None (default), implicit information will be inferred from the ratings (e.g.: item rated, means items seen).
trust: Orange.data.Table
Social trust information.
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int seed, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.
callback: callable
Method that receives the current iteration as an argument.
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

Ranking (recommendation)

CLiMF

CLiMF (Collaborative Less-is-More Filtering) is used in scenarios with binary relevance data. Hence, it’s focused on improving top-k recommendations through ranking by directly maximizing the Mean Reciprocal Rank (MRR).

Following a similar technique as other iterative approaches, the two low-rank matrices can be randomly initialize and then optimize through a training loss like this:

\[F(U,V) = \sum _{ i=1 }^{ M }{ \sum _{ j=1 }^{ N }{ { Y }_{ ij }[\ln{\quad g({ U }_{ i }^{ T }V_{ i })} +\sum _{ k=1 }^{ N }{ \ln { (1-{ Y }_{ ik }g({ U }_{ i }^{ T }V_{ k }-{ U }_{ i }^{ T }V_{ j })) } } ] } } -\frac { \lambda }{ 2 } ({ \left\| U \right\| }^{ 2 }+{ \left\| V \right\| }^{ 2 })\]

Note: Orange3 currently does not support ranking operations. Therefore, this model cannot be used neither in cross-validation nor in the prediction module available in Orange3

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
 import Orange
 import numpy as np
 from orangecontrib.recommendation import CLiMFLearner

 # Load data
 data = Orange.data.Table('epinions_train.tab')

 # Train recommender
 learner = CLiMFLearner(num_factors=10, num_iter=10, learning_rate=0.0001, lmbda=0.001)
 recommender = learner(data)

 # Load test dataset
 testdata = Orange.data.Table('epinions_test.tab')

 # Sample users
 num_users = len(recommender.U)
 num_samples = min(num_users, 1000)  # max. number to sample
 users_sampled = np.random.choice(np.arange(num_users), num_samples)

 # Compute Mean Reciprocal Rank (MRR)
 mrr, _ = recommender.compute_mrr(data=testdata, users=users_sampled)
 print('MRR: %.4f' % mrr)
 >>>
 MRR: 0.3975
class orangecontrib.recommendation.CLiMFLearner(num_factors=5, num_iter=25, learning_rate=0.0001, lmbda=0.001, preprocessors=None, optimizer=None, verbose=False, random_state=None, callback=None)[source]

CLiMF: Collaborative Less-is-More Filtering Matrix Factorization

This model uses stochastic gradient descent to find two low-rank matrices: user-feature matrix and item-feature matrix.

CLiMF is a matrix factorization for scenarios with binary relevance data when only a few (k) items are recommended to individual users. It improves top-k recommendations through ranking by directly maximizing the Mean Reciprocal Rank (MRR).

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.

callback: callable

fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

Optimizers (recommendation.optimizers)

The classes presented in this section are optimizers to modify the SGD updates during the training of a model.

The update functions control the learning rate during the SGD optimization

SGD Stochastic Gradient Descent (SGD) updates
Momentum Stochastic Gradient Descent (SGD) updates with momentum
NesterovMomentum Stochastic Gradient Descent (SGD) updates with Nesterov momentum
AdaGrad AdaGrad updates
RMSProp Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients.
AdaDelta Scale learning rates by a the ratio of accumulated gradients to accumulated step sizes, see [4] and notes for further description.
Adam Adam updates implemented as in [5].
Adamax Adamax updates implemented as in [6].

Stochastic Gradient Descent

This is the optimizer by default in all models.

class orangecontrib.recommendation.optimizers.SGD(learning_rate=1.0)[source]

Stochastic Gradient Descent (SGD) updates

Generates update expressions of the form:

  • param := param - learning_rate * gradient
Args:
learning_rate: float, optional
The learning rate controlling the size of update steps
update(grads, params, indices=None)[source]

SGD updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array, optional
Indices in params to update

Momentum

class orangecontrib.recommendation.optimizers.Momentum(learning_rate=1.0, momentum=0.9)[source]

Stochastic Gradient Descent (SGD) updates with momentum

Generates update expressions of the form:

  • velocity := momentum * velocity - learning_rate * gradient
  • param := param + velocity
Args:
learning_rate: float
The learning rate controlling the size of update steps
momentum: float, optional
The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.
Notes:
Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.
See Also:
apply_momentum: Generic function applying momentum to updates nesterov_momentum: Nesterov’s variant of SGD with momentum
update(grads, params, indices=None)[source]

Momentum updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array
Indices in params to update

Nesterov’s Accelerated Gradient

class orangecontrib.recommendation.optimizers.NesterovMomentum(learning_rate=1.0, momentum=0.9)[source]

Stochastic Gradient Descent (SGD) updates with Nesterov momentum

Generates update expressions of the form:

  • param_ahead := param + momentum * velocity
  • velocity := momentum * velocity - learning_rate * gradient_ahead
  • param := param + velocity

In order to express the update to look as similar to vanilla SGD, this can be written as:

  • v_prev := velocity
  • velocity := momentum * velocity - learning_rate * gradient
  • param := -momentum * v_prev + (1 + momentum) * velocity
Args:
learning_rate : float
The learning rate controlling the size of update steps
momentum: float, optional
The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.
Notes:

Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.

The classic formulation of Nesterov momentum (or Nesterov accelerated gradient) requires the gradient to be evaluated at the predicted next position in parameter space. Here, we use the formulation described at https://github.com/lisa-lab/pylearn2/pull/136#issuecomment-10381617, which allows the gradient to be evaluated at the current parameters.

See Also:
apply_nesterov_momentum: Function applying momentum to updates
update(grads, params, indices=None)[source]

NAG updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array, optional
Indices in params to update
Returns
updates: list of float
Variables updated with the gradients

AdaGradient

class orangecontrib.recommendation.optimizers.AdaGrad(learning_rate=1.0, epsilon=1e-06)[source]

AdaGrad updates

Scale learning rates by dividing with the square root of accumulated squared gradients. See [1] for further description.

  • param := param - learning_rate * gradient
Args:
learning_rate : float or symbolic scalar
The learning rate controlling the size of update steps
epsilon: float or symbolic scalar
Small value added for numerical stability
Notes:

Using step size eta Adagrad calculates the learning rate for feature i at time step t as:

\[\eta_{t,i} = \frac{\eta} {\sqrt{\sum^t_{t^\prime} g^2_{t^\prime,i}+\epsilon}} g_{t,i}\]

as such the learning rate is monotonically decreasing.

Epsilon is not included in the typical formula, see [2].

References:
[1]Duchi, J., Hazan, E., & Singer, Y. (2011): Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12:2121-2159.
[2]Chris Dyer: Notes on AdaGrad. http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf
update(grads, params, indices=None)[source]

AdaGrad updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array, optional
Indices in params to update

RMSProp

class orangecontrib.recommendation.optimizers.RMSProp(learning_rate=1.0, rho=0.9, epsilon=1e-06)[source]

Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients. See [3] for further description.

Args:
learning_rate: float
The learning rate controlling the size of update steps
rho: float
Gradient moving average decay factor
epsilon: float
Small value added for numerical stability
Notes:

rho should be between 0 and 1. A value of rho close to 1 will decay the moving average slowly and a value close to 0 will decay the moving average fast.

Using the step size \(\eta\) and a decay factor \(\rho\) the learning rate \(\eta_t\) is calculated as:

\[\begin{split}r_t &= \rho r_{t-1} + (1-\rho)*g^2\\ \eta_t &= \frac{\eta}{\sqrt{r_t + \epsilon}}\end{split}\]
References:
[3]Tieleman, T. and Hinton, G. (2012): Neural Networks for Machine Learning, Lecture 6.5 - rmsprop. Coursera. http://www.youtube.com/watch?v=O3sxAc4hxZU (formula @5:20)
update(grads, params, indices=None)[source]

RMSProp updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array, optional
Indices in params to update

AdaDelta

class orangecontrib.recommendation.optimizers.AdaDelta(learning_rate=1.0, rho=0.95, epsilon=1e-06)[source]

Scale learning rates by a the ratio of accumulated gradients to accumulated step sizes, see [4] and notes for further description.

Args:
learning_rate: float
The learning rate controlling the size of update steps
rho: float
Squared gradient moving average decay factor
epsilon: float
Small value added for numerical stability
Notes:

rho should be between 0 and 1. A value of rho close to 1 will decay the moving average slowly and a value close to 0 will decay the moving average fast.

rho = 0.95 and epsilon=1e-6 are suggested in the paper and reported to work for multiple datasets (MNIST, speech).

In the paper, no learning rate is considered (so learning_rate=1.0). Probably best to keep it at this value. epsilon is important for the very first update (so the numerator does not become 0).

Using the step size eta and a decay factor rho the learning rate is calculated as:

\[\begin{split}r_t &= \rho r_{t-1} + (1-\rho)*g^2\\ \eta_t &= \eta \frac{\sqrt{s_{t-1} + \epsilon}} {\sqrt{r_t + \epsilon}}\\ s_t &= \rho s_{t-1} + (1-\rho)*g^2\end{split}\]
References:
[4](1, 2) Zeiler, M. D. (2012): ADADELTA: An Adaptive Learning Rate Method. arXiv Preprint arXiv:1212.5701.
update(grads, params, indices=None)[source]

AdaDelta updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array, optional
Indices in params to update

Adam

class orangecontrib.recommendation.optimizers.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)[source]

Adam updates implemented as in [5].

Args:
learning_rate : float
The learning rate controlling the size of update steps
beta_1 : float
Exponential decay rate for the first moment estimates.
beta_2 : float
Exponential decay rate for the second moment estimates.
epsilon : float
Constant for numerical stability.
Notes:
The paper [5] includes an additional hyperparameter lambda. This is only needed to prove convergence of the algorithm and has no practical use, it is therefore omitted here.
References:
[5](1, 2, 3) Kingma, Diederik, and Jimmy Ba (2014): Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
update(grads, params, indices=None)[source]

Adam updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array, optional
Indices of parameters (‘params’) to update. If None (default), all parameters will be updated.
Returns
updates: list of float
Variables updated with the gradients

Adamax

class orangecontrib.recommendation.optimizers.Adamax(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)[source]

Adamax updates implemented as in [6]. This is a variant of of the Adam algorithm based on the infinity norm.

Args:
learning_rate : float
The learning rate controlling the size of update steps
beta_1 : float
Exponential decay rate for the first moment estimates.
beta_2 : float
Exponential decay rate for the second moment estimates.
epsilon : float
Constant for numerical stability.
References:
[6](1, 2) Kingma, Diederik, and Jimmy Ba (2014): Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
update(grads, params, indices=None)[source]

Adamax updates

Args:
grads: array
List of gradient expressions
params: array
The variables to generate update expressions for
indices: array, optional
Indices of parameters (‘params’) to update. If None (default), all parameters will be updated.
Returns
updates: list of float
Variables updated with the gradients

Benchmarks

Rating

All this results refer to the training phase.

FilmTrust

Additional information:

  • Loading time: 0.748s
  • Training dataset: users=1,508; items=2,071; ratings=35,497; sparsity: 1.14%
  • No optimizers for SGD used.
Algorithm RMSE MAE Train time Settings
Global Average 0.919 0.715 0.000s -
Item Average 0.861 0.674 0.000s -
User Average 0.785 0.606 0.000s -
User-Item Baseline 0.738 0.566 0.001s -
BRISMF 0.712 0.551 0.820s/iter num_factors=10; num_iter=15, learning_rate=0.01; lmbda=0.1
SVD++ 0.707 0.546 1.974s/iter num_factors=10; num_iter=15; learning_rate=0.01; lmbda=0.1;
TrustSVD 0.677 0.520 3.604s/iter num_factors=10; num_iter=15; learning_rate=0.01; lmbda=0.12; social_lmbda=0.9

MovieLens100K

Additional information:

  • Loading time: 0.748s
  • Training dataset: users=943; items=1,682; ratings=100,000; sparsity: 6.30%
  • No optimizers for SGD used.
Algorithm RMSE MAE Train time Settings
Global Average 1.126 0.945 0.001s -
Item Average 1.000 0.799 0.001s -
User Average 1.031 0.826 0.001s -
User-Item Baseline 0.938 0.738 0.001s -
BRISMF 0.810 0.642 2.027s/iter num_factors=15; num_iter=15, learning_rate=0.07; lmbda=0.1
SVD++ 0.823 0.648 7.252s/iter num_factors=15; num_iter=15; learning_rate=0.02; bias_learning_rate=0.01; lmbda=0.1; bias_lmbda=0.007

MovieLens1M

Additional information:

  • Loading time: 5.144s
  • Training dataset: users=6,040; items=3,706; ratings=1,000,209; sparsity: 4.47%
  • No optimizers for SGD used.
Algorithm RMSE MAE Train time Settings
Global Average 1.117 0.934 0.010s -
Item Average 0.975 0.779 0.018s -
User Average 1.028 0.823 0.021s -
User-Item Baseline 0.924 0.727 0.027s -
BRISMF 0.886 0.704 19.757s/iter num_factors=15; num_iter=15, learning_rate=0.07; lmbda=0.1
SVD++ 0.858 0.677 98.249s/iter num_factors=15; num_iter=15; learning_rate=0.02; bias_learning_rate=0.01; lmbda=0.1; bias_lmbda=0.007

MovieLens10M

Additional information:

  • Loading time: 55.312s
  • Training dataset: users=71,567; items=10,681; ratings=10,000,054; sparsity: 1.308%
  • No optimizers for SGD used.
Algorithm RMSE MAE Train time Settings
Global Average 1.060 0.856 0.150s -
Item Average 0.942 0.737 0.271s -
User Average 0.970 0.763 0.293s -
User-Item Baseline 0.877 0.677 0.393s -
BRISMF - - 230.656s/iter num_factors=15; num_iter=15, learning_rate=0.07; lmbda=0.1

Ranking

Epinions

Additional information:

  • Loading time (training dataset): 0.094s
  • Loading time (test dataset): 1.392s
  • Training dataset: users=4,718; items=49,288; ratings=23,590; sparsity: 0.0101%
  • Testing dataset: users=4,718; items=49,288; ratings=322,445; sparsity: 0.1386%
  • No optimizers for SGD used.
Algorit hm MRR (train) MRR (test) Train time Settings
CLiMF 0.0758 0.3975 1.323s/i ter num_factors=10; num_iter=10; learning_rate=0.0001; lmda=0.001

Indices and tables