Welcome to Surprise’ documentation!¶
Surprise is an easytouse Python scikit for recommender systems.
If you’re new to Surprise, we invite you to take a look at the Getting Started guide, where you’ll find a series of tutorials illustrating all you can do with Surprise. You can also check out the FAQ for many usecase example. For installation guidelines, please refer to the project page.
Any kind of feedback/criticism would be greatly appreciated (software design, documentation, improvement ideas, spelling mistakes, etc...). Please feel free to contribute and send pull requests (see GitHub page)!
Getting Started¶
Basic usage¶
Surprise has a set of builtin algorithms and datasets for you to play with. In its simplest form, it takes about four lines of code to evaluate the performance of an algorithm:
from surprise import SVD
from surprise import Dataset
from surprise import evaluate, print_perf
# Load the movielens100k dataset (download it if needed),
# and split it into 3 folds for crossvalidation.
data = Dataset.load_builtin('ml100k')
data.split(n_folds=3)
# We'll use the famous SVD algorithm.
algo = SVD()
# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
print_perf(perf)
If Surprise cannot find the
movielens100k dataset, it will
offer to download it and will store it under the .surprise_data
folder in
your home directory. The split()
method automatically splits the
dataset into 3 folds and the evaluate()
function runs the crossvalidation procedure and compute some accuracy
measures.
Load a custom dataset¶
You can of course use a custom dataset. Surprise offers two ways of loading a custom dataset:
 you can either specify a single file with all the ratings and
use the
split ()
method to perform crossvalidation ;  or if your dataset is already split into predefined folds, you can specify a list of files for training and testing.
Either way, you will need to define a Reader
object for Surprise to be able to
parse the file(s).
We’ll see how to handle both cases with the movielens100k dataset. Of course this is a builtin dataset, but we will act as if it were not.
Load an entire dataset¶
# path to dataset file
file_path = os.path.expanduser('~/.surprise_data/ml100k/ml100k/u.data')
# As we're loading a custom dataset, we need to define a reader. In the
# movielens100k dataset, each line has the following format:
# 'user item rating timestamp', separated by '\t' characters.
reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_file(file_path, reader=reader)
data.split(n_folds=5)
Note
Actually, as the Movielens100k dataset is builtin, Surprise provides with a proper reader so in this case, we could have just created the reader like this:
reader = Reader('ml100k')
For more details about readers and how to use them, see the Reader
class
documentation.
Load a dataset with predefined folds¶
# path to dataset folder
files_dir = os.path.expanduser('~/.surprise_data/ml100k/ml100k/')
# This time, we'll use the builtin reader.
reader = Reader('ml100k')
# folds_files is a list of tuples containing file paths:
# [(u1.base, u1.test), (u2.base, u2.test), ... (u5.base, u5.test)]
train_file = files_dir + 'u%d.base'
test_file = files_dir + 'u%d.test'
folds_files = [(train_file % i, test_file % i) for i in (1, 2, 3, 4, 5)]
data = Dataset.load_from_folds(folds_files, reader=reader)
Of course, nothing prevents you from only loading a single file for training
and a single file for testing. However, the folds_files
parameter still
needs to be a list
.
Advanced usage¶
We will here get a little deeper on what can Surprise do for you.
Tune algorithm parameters with GridSearch¶
The evaluate()
function gives us the
results on one set of parameters given to the algorithm. If the user wants
to try the algorithm on a different set of parameters, the
GridSearch
class comes to the rescue.
Given a dict
of parameters, this
class exhaustively tries all the combination of parameters and helps get the
best combination for an accuracy measurement. It is analogous to
GridSearchCV from scikitlearn.
For instance, suppose that we want to tune the parameters of the
SVD
. Some of
the parameters of this algorithm are n_epochs
, lr_all
and reg_all
.
Thus we define a parameters grid as follows
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
'reg_all': [0.4, 0.6]}
Next we define a GridSearch
instance
and give it the class
SVD
as an
algorithm, and param_grid
. We will compute both the
RMSE and FCP values for all the combination. Thus the following definition:
grid_search = GridSearch(SVD, param_grid, measures=['RMSE', 'FCP'])
Now that GridSearch
instance is ready,
we can evaluate the algorithm on any data with the
GridSearch.evaluate()
method,
exactly like with the regular
evaluate()
function:
data = Dataset.load_builtin('ml100k')
data.split(n_folds=3)
grid_search.evaluate(data)
Everything is ready now to read the results. For example, we get the best RMSE and FCP scores and parameters as follows:
# best RMSE score
print(grid_search.best_score['RMSE'])
# >>> 0.96117566386
# combination of parameters that gave the best RMSE score
print(grid_search.best_params['RMSE'])
# >>> {'reg_all': 0.4, 'lr_all': 0.005, 'n_epochs': 10}
# best FCP score
print(grid_search.best_score['FCP'])
# >>> 0.702279736531
# combination of parameters that gave the best FCP score
print(grid_search.best_params['FCP'])
# >>> {'reg_all': 0.6, 'lr_all': 0.005, 'n_epochs': 10}
For further analysis, we can easily read all the results in a pandas
DataFrame
as follows:
import pandas as pd # noqa
results_df = pd.DataFrame.from_dict(grid_search.cv_results)
print(results_df)
Manually iterate over folds¶
We have so far used the evaluate()
function that does all the hard work for us. If you want to have better control
on your experiments, you can use the folds()
generator of your dataset, and then the
train()
and
test()
methods
of your algorithm on each of the folds:
data = Dataset.load_builtin('ml100k')
data.split(n_folds=3)
algo = BaselineOnly()
for trainset, testset in data.folds():
# train and test algorithm.
algo.train(trainset)
predictions = algo.test(testset)
# Compute and print Root Mean Squared Error
rmse = accuracy.rmse(predictions, verbose=True)
Train on a whole trainset and specifically query for predictions¶
We will here review how to get a prediction for specified users and items. In the mean time, we will also review how to train on a whole dataset, without performing crossvalidation (i.e. there is no test set).
The latter is pretty straightforward: all you need is to load a dataset, and
the build_full_trainset()
method to build the
trainset
and train you algorithm:
data = Dataset.load_builtin('ml100k')
# Retrieve the trainset.
trainset = data.build_full_trainset()
# Build an algorithm, and train it.
algo = KNNBasic()
algo.train(trainset)
Now, there’s no way we could call the test()
method, because we
have no testset. But you can still get predictions for the users and items you
want.
Let’s say you’re interested in user 196 and item 302 (make sure they’re in the
trainset!), and you know that the true rating \(r_{ui} = 4\). All you need
is call the predict()
method:
uid = str(196) # raw user id (as in the ratings file). They are **strings**!
iid = str(302) # raw item id (as in the ratings file). They are **strings**!
# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)
If the predict()
method is called
with user or item ids that were not part of the trainset, it’s up to the
algorithm to decide if it still can make a prediction or not. If it can’t,
predict()
will still predict the mean of all ratings \(\mu\).
Note
Raw ids are ids as defined in a rating file. They can be strings, numbers, or
whatever (but are still represented as strings). On trainset creation, each
raw id is mapped to a unique integer called inner id, which is a lot more
suitable for Surprise to
manipulate. Conversions between raw and inner ids can be done using the
to_inner_uid()
,
to_inner_iid()
,
to_raw_uid()
, and
to_raw_iid()
methods of the
trainset
.
Obviously, it is perfectly fine to use the predict()
method directly
during a crossvalidation process. It’s then up to you to ensure that the user
and item ids are present in the trainset though.
Command line usage¶
Surprise can also be used from the command line, for example:
surprise algo SVD params "{'n_epochs': 5, 'verbose': True}" loadbuiltin ml100k nfolds 3
See detailed usage by running:
surprise h
Using prediction algorithms¶
Surprise provides with a bunch of builtin algorithms. All algorithms derive
from the AlgoBase
base class, where are implemented some key methods (e.g. predict
, train
and test
). You can find the
details of each of these in the prediction_algorithms
package documentation.
Every algorithm is part of the global Surprise namespace, so you only need to import their names from the Surprise package, for example:
from surprise import KNNBasic
algo = KNNBasic()
Some of these algorithms may use baseline estimates, some may use a similarity measure. We will here review how to configure the way baselines and similarities are computed.
Baselines estimates configuration¶
Note
This section only applies to algorithms (or similarity measures) that try to minimize the following regularized squared error (or equivalent):
For algorithms using baselines in another objective function (e.g. the
SVD
algorithm), the baseline configuration is done differently and is specific to
each algorithm. Please refer to their own documentation.
First of all, if you do not want to configure the way baselines are computed, you don’t have to: the default parameters will do just fine. If you do want to well... This is for you.
You may want to read section 2.1 of [Kor10] to get a good idea of what are baseline estimates.
Baselines can be estimated in two different ways:
 Using Stochastic Gradient Descent (SGD).
 Using Alternating Least Squares (ALS).
You can configure the way baselines are computed using the bsl_options
parameter passed at the creation of an algorithm. This parameter is a
dictionary for which the key 'method'
indicates the method to use. Accepted
values are 'als'
(default) and 'sgd'
. Depending on its value, other
options may be set. For ALS:
'reg_i'
: The regularization parameter for items. Corresponding to \(\lambda_2\) in [Kor10]. Default is10
.'reg_u'
: The regularization parameter for users. Corresponding to \(\lambda_3\) in [Kor10]. Default is15
.'n_epochs'
: The number of iteration of the ALS procedure. Default is10
. Note that in [Kor10], what is described is a single iteration ALS process.
And for SGD:
'reg'
: The regularization parameter of the cost function that is optimized, corresponding to \(\lambda_1\) and then \(\lambda_5\) in [Kor10] Default is0.02
.'learning_rate'
: The learning rate of SGD, corresponding to \(\gamma\) in [Kor10]. Default is0.005
.'n_epochs'
: The number of iteration of the SGD procedure. Default is 20.
Note
For both procedures (ALS and SGD), user and item biases (\(b_u\) and \(b_i\)) are initialized to zero.
Usage examples:
print('Using ALS')
bsl_options = {'method': 'als',
'n_epochs': 5,
'reg_u': 12,
'reg_i': 5
}
algo = BaselineOnly(bsl_options=bsl_options)
print('Using SGD')
bsl_options = {'method': 'sgd',
'learning_rate': .00005,
}
algo = BaselineOnly(bsl_options=bsl_options)
Note that some similarity measures may use baselines, such as the
pearson_baseline
similarity.
Configuration works just the same, whether the baselines are used in the actual
prediction \(\hat{r}_{ui}\) or not:
bsl_options = {'method': 'als',
'n_epochs': 20,
}
sim_options = {'name': 'pearson_baseline'}
algo = KNNBasic(bsl_options=bsl_options, sim_options=sim_options)
This leads us to similarity measure configuration, which we will review right now.
Similarity measure configuration¶
Many algorithms use a similarity measure to estimate a rating. The way they can
be configured is done in a similar fashion as for baseline ratings: you just
need to pass a sim_options
argument at the creation of an algorithm. This
argument is a dictionary with the following (all optional) keys:
'name'
: The name of the similarity to use, as defined in thesimilarities
module. Default is'MSD'
.'user_based'
: Whether similarities will be computed between users or between items. This has a huge impact on the performance of a prediction algorithm. Default isTrue
.'min_support'
: The minimum number of common items (when'user_based'
is'True'
) or minimum number of common users (when'user_based'
is'False'
) for the similarity not to be zero. Simply put, if \(I_{uv} < \text{min_support}\) then \(\text{sim}(u, v) = 0\). The same goes for items.'shrinkage'
: Shrinkage parameter to apply (only relevant forpearson_baseline
similarity). Default is 100.
Usage examples:
sim_options = {'name': 'cosine',
'user_based': False # compute similarities between items
}
algo = KNNBasic(sim_options=sim_options)
sim_options = {'name': 'pearson_baseline',
'shrinkage': 0 # no shrinkage
}
algo = KNNBasic(sim_options=sim_options)
See also
The similarities
module.
How to build your own prediction algorithm¶
This page describes how to build a custom prediction algorithm using Surprise.
The basics¶
Want to get your hands dirty? Cool.
Creating your own prediction algorithm is pretty simple: an algorithm is
nothing but a class derived from AlgoBase
that has an estimate
method. This is the method that is called by the predict()
method. It takes
in an inner user id, an inner item id (see this note), and returns the estimated rating \(\hat{r}_{ui}\):
from surprise import AlgoBase
from surprise import Dataset
from surprise import evaluate
class MyOwnAlgorithm(AlgoBase):
def __init__(self):
# Always call base method before doing anything.
AlgoBase.__init__(self)
def estimate(self, u, i):
return 3
data = Dataset.load_builtin('ml100k')
algo = MyOwnAlgorithm()
evaluate(algo, data)
This algorithm is the dumbest we could have thought of: it just predicts a rating of 3, regardless of users and items.
If you want to store additional information about the prediction, you can also return a dictionary with given details:
def estimate(self, u, i):
details = {'info1' : 'That was',
'info2' : 'easy stuff :)'}
return 3, details
This dictionary will be stored in the prediction
as the details
field and can be used for later analysis.
The train
method¶
Now, let’s make a slightly cleverer algorithm that predicts the average of all
the ratings of the trainset. As this is a constant value that does not depend
on current user or item, we would rather compute it once and for all. This can
be done by defining the train
method:
class MyOwnAlgorithm(AlgoBase):
def __init__(self):
# Always call base method before doing anything.
AlgoBase.__init__(self)
def train(self, trainset):
# Here again: call base method before doing anything.
AlgoBase.train(self, trainset)
# Compute the average rating. We might as well use the
# trainset.global_mean attribute ;)
self.the_mean = np.mean([r for (_, _, r) in
self.trainset.all_ratings()])
def estimate(self, u, i):
return self.the_mean
The train
method is called by the evaluate
function at each fold of a crossvalidation
process, (but you can also call it yourself).
Before doing anything, you should call the base class train()
method.
The trainset
attribute¶
Once the base class train()
method has returned,
all the info you need about the current training set (rating values, etc...) is
stored in the self.trainset
attribute. This is a Trainset
object that has many attributes and methods of
interest for prediction.
To illustrate its usage, let’s make an algorithm that predicts an average between the mean of all ratings, the mean rating of the user and the mean rating for the item:
def estimate(self, u, i):
sum_means = self.trainset.global_mean
div = 1
if self.trainset.knows_user(u):
sum_means += np.mean([r for (_, r) in self.trainset.ur[u]])
div += 1
if self.trainset.knows_item(i):
sum_means += np.mean([r for (_, r) in self.trainset.ir[i]])
div += 1
return sum_means / div
Note that it would have been a better idea to compute all the user means in the
train
method, thus avoiding the same computations multiple times.
When the prediction is impossible¶
It’s up to your algorithm to decide if it can or cannot yield a prediction. If
the prediction is impossible, then you can raise the
PredictionImpossible
exception.
You’ll need to import it first):
from surprise import PredictionImpossible
This exception will be caught by the predict()
method, and the
estimation \(\hat{r}_{ui}\) will be set to the global mean of all ratings
\(\mu\).
Using similarities and baselines¶
Should your algorithm use a similarity measure or baseline estimates, you’ll
need to accept bsl_options
and sim_options
as parameters to the
__init__
method, and pass them along to the Base class. See how to use
these parameters in the Using prediction algorithms section.
Methods compute_baselines()
and
compute_similarities()
can
be called in the train
method (or anywhere else).
class MyOwnAlgorithm(AlgoBase):
def __init__(self, sim_options={}, bsl_options={}):
AlgoBase.__init__(self, sim_options=sim_options,
bsl_options=bsl_options)
def train(self, trainset):
AlgoBase.train(self, trainset)
# Compute baselines and similarities
self.bu, self.bi = self.compute_baselines()
self.sim = self.compute_similarities()
def estimate(self, u, i):
if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)):
raise PredictionImpossible('User and/or item is unkown.')
# Compute similarities between u and v, where v describes all other
# users that have also rated item i.
neighbors = [(v, self.sim[u, v]) for (v, r) in self.trainset.ir[i]]
# Sort these neighbors by similarity
neighbors = sorted(neighbors, key=lambda x: x[1], reverse=True)
print('The 3 nearest neighbors of user', str(u), 'are:')
for v, sim_uv in neighbors[:3]:
print('user {0:} with sim {1:1.2f}'.format(v, sim_uv))
# ... Aaaaand return the baseline estimate anyway ;)
bsl = self.trainset.global_mean + self.bu[u] + self.bi[i]
return bsl
Feel free to explore the prediction_algorithms package source to get an idea of what can be done.
Notation standards, References¶
In the documentation, you will find the following notation:
 \(R\) : the set of all ratings.
 \(R_{train}\), \(R_{test}\) and \(\hat{R}\) denote the training set, the test set, and the set of predicted ratings.
 \(U\) : the set of all users. \(u\) and \(v\) denotes users.
 \(I\) : the set of all items. \(i\) and \(j\) denotes items.
 \(U_i\) : the set of all users that have rated item \(i\).
 \(U_{ij}\) : the set of all users that have rated both items \(i\) and \(j\).
 \(I_u\) : the set of all items rated by user \(u\).
 \(I_{uv}\) : the set of all items rated by both users \(u\) and \(v\).
 \(r_{ui}\) : the true rating of user \(u\) for item \(i\).
 \(\hat{r}_{ui}\) : the estimated rating of user \(u\) for item \(i\).
 \(b_{ui}\) : the baseline rating of user \(u\) for item \(i\).
 \(\mu\) : the mean of all ratings.
 \(\mu_u\) : the mean of all ratings given by user \(u\).
 \(\mu_i\) : the mean of all ratings given to item \(i\).
 \(N_i^k(u)\) : the \(k\) nearest neighbors of user \(u\) that
have rated item \(i\). This set is computed using a
similarity metric
.  \(N_u^k(i)\) : the \(k\) nearest neighbors of item \(i\) that
are rated by user \(u\). This set is computed using a
similarity metric
.
References
Here are the papers used as references in the documentation. Links to pdf files where added when possible. A simple Google search should lead you easily to the missing ones :)
[GM05]  Thomas George and Srujana Merugu. A scalable collaborative filtering framework based on coclustering. 2005. URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.113.6458&rep=rep1&type=pdf. 
[Kor08]  Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. 2008. URL: http://www.cs.rochester.edu/twiki/pub/Main/HarpSeminar/Factorization_Meets_the_Neighborhood_a_Multifaceted_Collaborative_Filtering_Model.pdf. 
[Kor10]  Yehuda Koren. Factor in the neighbors: scalable and accurate collaborative filtering. 2010. URL: http://courses.ischool.berkeley.edu/i290dm/s11/SECURE/a1koren.pdf. 
[KBV09]  Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. 2009. 
[LS01]  Daniel D. Lee and H. Sebastian Seung. Algorithms for nonnegative matrix factorization. 2001. URL: http://papers.nips.cc/paper/1861algorithmsfornonnegativematrixfactorization.pdf. 
[LM07]  Daniel Lemire and Anna Maclachlan. Slope one predictors for online ratingbased collaborative filtering. 2007. URL: http://arxiv.org/abs/cs/0702144. 
[LZXZ14]  Xin Luo, Mengchu Zhou, Yunni Xia, and Qinsheng Zhu. An efficient nonnegative matrix factorizationbased approach to collaborative filtering for recommender systems. 2014. 
[RRSK10]  Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. Recommender Systems Handbook. 1st edition, 2010. 
[SM08]  Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. 2008. URL: http://papers.nips.cc/paper/3208probabilisticmatrixfactorization.pdf. 
[ZWFM96]  Sheng Zhang, Weihong Wang, James Ford, and Fillia Makedon. Learning from incomplete ratings using nonnegative matrix factorization. 1996. URL: http://www.siam.org/meetings/sdm06/proceedings/059zhangs2.pdf. 
FAQ¶
You will find here the Frequently Asked Questions, as well as some other usecase examples that are not part of the User Guide.
How to get the topN recommendations for each user¶
Here is an example where we retrieve retrieve the top10 items with highest rating prediction for each user in the MovieLens100k dataset. We first train an SVD algorithm on the whole dataset, and then predict all the ratings for the pairs (user, item) that are not in the training set. We then retrieve the top10 prediction for each user.
from collections import defaultdict
from surprise import SVD
from surprise import Dataset
def get_top_n(predictions, n=10):
'''Return the topN recommendation for each user from a set of predictions.
Args:
predictions(list of Prediction objects): The list of predictions, as
returned by the test method of an algorithm.
n(int): The number of recommendation to output for each user. Default
is 10.
Returns:
A dict where keys are user (raw) ids and values are lists of tuples:
[(raw item id, rating estimation), ...] of size n.
'''
# First map the predictions to each user.
top_n = defaultdict(list)
for uid, iid, true_r, est, _ in predictions:
top_n[uid].append((iid, est))
# Then sort the predictions for each user and retrieve the k highest ones.
for uid, user_ratings in top_n.items():
user_ratings.sort(key=lambda x: x[1], reverse=True)
top_n[uid] = user_ratings[:n]
return top_n
# First train an SVD algorithm on the movielens dataset.
data = Dataset.load_builtin('ml100k')
trainset = data.build_full_trainset()
algo = SVD()
algo.train(trainset)
# Than predict ratings for all pairs (u, i) that are NOT in the training set.
testset = trainset.build_anti_testset()
predictions = algo.test(testset)
top_n = get_top_n(predictions, n=10)
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
print(uid, [iid for (iid, _) in user_ratings])
How to get the k nearest neighbors of a user (or item)¶
You can use the get_neighbors()
methods of
the algorithm object. This is only relevant for algorithms that use a
similarity measure, such as the kNN algorithms.
Here is an example where we retrieve the 10 nearest neighbors of the movie Toy Story from the MovieLens100k dataset. The output is:
The 10 nearest neighbors of Toy Story are:
Beauty and the Beast (1991)
Raiders of the Lost Ark (1981)
That Thing You Do! (1996)
Lion King, The (1994)
Craft, The (1996)
Liar Liar (1997)
Aladdin (1992)
Cool Hand Luke (1967)
Winnie the Pooh and the Blustery Day (1968)
Indiana Jones and the Last Crusade (1989)
There’s a lot of boilerplate because of the conversions between movie names and
their raw/inner ids (see this note), but it all boils
down to the use of get_neighbors()
:
import os
import io # needed because of weird encoding of u.item file
from surprise import KNNBaseline
from surprise import Dataset
def read_item_names():
"""Read the u.item file from MovieLens 100k dataset and return two
mappings to convert raw ids into movie names and movie names into raw ids.
"""
file_name = (os.path.expanduser('~') +
'/.surprise_data/ml100k/ml100k/u.item')
rid_to_name = {}
name_to_rid = {}
with io.open(file_name, 'r', encoding='ISO88591') as f:
for line in f:
line = line.split('')
rid_to_name[line[0]] = line[1]
name_to_rid[line[1]] = line[0]
return rid_to_name, name_to_rid
# First, train the algortihm to compute the similarities between items
data = Dataset.load_builtin('ml100k')
trainset = data.build_full_trainset()
sim_options = {'name': 'pearson_baseline', 'user_based': False}
algo = KNNBaseline(sim_options=sim_options)
algo.train(trainset)
# Read the mappings raw id <> movie name
rid_to_name, name_to_rid = read_item_names()
# Retieve inner id of the movie Toy Story
toy_story_raw_id = name_to_rid['Toy Story (1995)']
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)
# Retrieve inner ids of the nearest neighbors of Toy Story.
toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)
# Convert inner ids of the neighbors into names.
toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)
for inner_id in toy_story_neighbors)
toy_story_neighbors = (rid_to_name[rid]
for rid in toy_story_neighbors)
print()
print('The 10 nearest neighbors of Toy Story are:')
for movie in toy_story_neighbors:
print(movie)
Naturally, the same can be done for users with minor modifications.
How to serialize an algorithm¶
Prediction algorithms can be serialized and loaded back using the dump()
and load()
functions. Here
is a small example where the SVD algorithm is trained on a dataset and
serialized. It is then reloaded and can be used again for making predictions:
import os
from surprise import SVD
from surprise import Dataset
from surprise import dump
data = Dataset.load_builtin('ml100k')
trainset = data.build_full_trainset()
algo = SVD()
algo.train(trainset)
# Compute predictions of the 'original' algorithm.
predictions = algo.test(trainset.build_testset())
# Dump algorithm and reload it.
file_name = os.path.expanduser('~/dump_file')
dump.dump(file_name, algo=algo)
_, loaded_algo = dump.load(file_name)
# We now ensure that the algo is still the same by checking the predictions.
predictions_loaded_algo = loaded_algo.test(trainset.build_testset())
assert predictions == predictions_loaded_algo
print('Predictions are the same')
Algorithms can be serialized along with their predictions, so that can be further analyzed or compared with other algorithms, using pandas dataframes. Some examples are given in the two following notebooks:
Can I use my own dataset with Surprise¶
Yes, you can. See the user guide.
How to tune an algorithm parameters¶
You can tune the parameters of an algorithm with the GridSearch
class as described here. After the tuning, you may want to have an
unbiased estimate of your algorithm performances.
How to get accuracy measures on the training set¶
You can use the build_testset()
method of the Trainset
object to build a testset that can be then used
with the test()
method:
from surprise import Dataset
from surprise import SVD
from surprise import accuracy
data = Dataset.load_builtin('ml100k')
algo = SVD()
trainset = data.build_full_trainset()
algo.train(trainset)
testset = trainset.build_testset()
predictions = algo.test(testset)
# RMSE should be low as we are biased
accuracy.rmse(predictions, verbose=True) # ~ 0.68 (which is low)
Check out the example file for more usage examples.
How to save some data for unbiased accuracy estimation¶
If your goal is to tune the parameters of an algorithm, you may want to spare a bit of data to have an unbiased estimation of its performances. For instance you may want to split your data into two sets A and B. A is used for parameter tuning using grid search, and B is used for unbiased estimation. This can be done as follows:
import random
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise import GridSearch
# Load the full dataset.
data = Dataset.load_builtin('ml100k')
raw_ratings = data.raw_ratings
# shuffle ratings if you want
random.shuffle(raw_ratings)
# A = 90% of the data, B = 10% of the data
threshold = int(.9 * len(raw_ratings))
A_raw_ratings = raw_ratings[:threshold]
B_raw_ratings = raw_ratings[threshold:]
data.raw_ratings = A_raw_ratings # data is now the set A
data.split(n_folds=3)
# Select your best algo with grid search.
print('Grid Search...')
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005]}
grid_search = GridSearch(SVD, param_grid, measures=['RMSE'], verbose=0)
grid_search.evaluate(data)
algo = grid_search.best_estimator['RMSE']
# retrain on the whole set A
trainset = data.build_full_trainset()
algo.train(trainset)
# Compute biased accuracy on A
predictions = algo.test(trainset.build_testset())
print('Biased accuracy on A,', end=' ')
accuracy.rmse(predictions)
# Compute unbiased accuracy on B
testset = data.construct_testset(B_raw_ratings) # testset is now the set B
predictions = algo.test(testset)
print('Unbiased accuracy on B,', end=' ')
accuracy.rmse(predictions)
prediction_algorithms package¶
The prediction_algorithms
package includes the prediction algorithms
available for recommendation.
The available prediction algorithms are:
random_pred.NormalPredictor 
Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal. 
baseline_only.BaselineOnly 
Algorithm predicting the baseline estimate for given user and item. 
knns.KNNBasic 
A basic collaborative filtering algorithm. 
knns.KNNWithMeans 
A basic collaborative filtering algorithm, taking into account the mean ratings of each user. 
knns.KNNBaseline 
A basic collaborative filtering algorithm taking into account a baseline rating. 
matrix_factorization.SVD 
The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. 
matrix_factorization.SVDpp 
The SVD++ algorithm, an extension of SVD taking into account implicit ratings. 
matrix_factorization.NMF 
A collaborative filtering algorithm based on Nonnegative Matrix Factorization. 
slope_one.SlopeOne 
A simple yet accurate collaborative filtering algorithm. 
co_clustering.CoClustering 
A collaborative filtering algorithm based on coclustering. 
You may want to check the notation standards before diving into the formulas.
The algorithm base class¶
The surprise.prediction_algorithms.algo_base
module defines the base
class AlgoBase
from which every single prediction algorithm has to
inherit.

class
surprise.prediction_algorithms.algo_base.
AlgoBase
(**kwargs)¶ Abstract class where is defined the basic behavior of a prediction algorithm.
Keyword Arguments: baseline_options (dict, optional) – If the algorithm needs to compute a baseline estimate, the baseline_options
parameter is used to configure how they are computed. See Baselines estimates configuration for usage.
compute_baselines
()¶ Compute users and items baselines.
The way baselines are computed depends on the
bsl_options
parameter passed at the creation of the algorithm (see Baselines estimates configuration).This method is only relevant for algorithms using
Pearson baseline similarty
or theBaselineOnly
algorithm.Returns: A tuple (bu, bi)
, which are users and items baselines.

compute_similarities
()¶ Build the similarity matrix.
The way the similarity matrix is computed depends on the
sim_options
parameter passed at the creation of the algorithm (see Similarity measure configuration).This method is only relevant for algorithms using a similarity measure, such as the kNN algorithms.
Returns: The similarity matrix.

get_neighbors
(iid, k)¶ Return the
k
nearest neighbors ofiid
, which is the inner id of a user or an item, depending on theuser_based
field ofsim_options
(see Similarity measure configuration).As the similarities are computed on the basis of a similarity measure, this method is only relevant for algorithms using a similarity measure, such as the kNN algorithms.
For a usage example, see the FAQ.
Parameters:  iid (int) – The (inner) id of the user (or item) for which we want the nearest neighbors. See this note.
 k (int) – The number of neighbors to retrieve.
Returns: The list of the
k
(inner) ids of the closest users (or items) toiid
.

predict
(uid, iid, r_ui=None, clip=True, verbose=False)¶ Compute the rating prediction for given user and item.
The
predict
method converts raw ids to inner ids and then calls theestimate
method which is defined in every derived class. If the prediction is impossible (for whatever reason), the prediction is set to the global mean of all ratings.Parameters:  uid – (Raw) id of the user. See this note.
 iid – (Raw) id of the item. See this note.
 r_ui (float) – The true rating \(r_{ui}\). Optional, default is
None
.  clip (bool) – Whether to clip the estimation into the rating scale.
For example, if \(\hat{r}_{ui}\) is \(5.5\) while the
rating scale is \([1, 5]\), then \(\hat{r}_{ui}\) is
set to \(5\). Same goes if \(\hat{r}_{ui} < 1\).
Default is
True
.  verbose (bool) – Whether to print details of the prediction. Default is False.
Returns: A
Prediction
object containing: The (raw) user id
uid
.  The (raw) item id
iid
.  The true rating
r_ui
(\(\hat{r}_{ui}\)).  The estimated rating (\(\hat{r}_{ui}\)).
 Some additional details about the prediction that might be useful for later analysis.

test
(testset, verbose=False)¶ Test the algorithm on given testset, i.e. estimate all the ratings in the given testset.
Parameters:  testset – A test set, as returned by the
folds()
method or by thebuild_testset()
method.  verbose (bool) – Whether to print details for each predictions. Default is False.
Returns: A list of
Prediction
objects that contains all the estimated ratings. testset – A test set, as returned by the

train
(trainset)¶ Train an algorithm on a given training set.
This method is called by every derived class as the first basic step for training an algorithm. It basically just initializes some internal structures and set the self.trainset attribute.
Parameters: trainset ( Trainset
) – A training set, as returned by thefolds
method.

The predictions module¶
The surprise.prediction_algorithms.predictions
module defines the
Prediction
named tuple and the PredictionImpossible
exception.

class
surprise.prediction_algorithms.predictions.
Prediction
¶ A named tuple for storing the results of a prediction.
It’s wrapped in a class, but only for documentation and printing purposes.
Parameters:

exception
surprise.prediction_algorithms.predictions.
PredictionImpossible
¶ Exception raised when a prediction is impossible.
When raised, the estimation \(\hat{r}_{ui}\) is set to the global mean of all ratings \(\mu\).
Basic algorithms¶
These are basic algorithms that do not do much work but that are still useful for comparing accuracies.

class
surprise.prediction_algorithms.random_pred.
NormalPredictor
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.
The prediction \(\hat{r}_{ui}\) is generated from a normal distribution \(\mathcal{N}(\hat{\mu}, \hat{\sigma}^2)\) where \(\hat{\mu}\) and \(\hat{\sigma}\) are estimated from the training data using Maximum Likelihood Estimation:
\[\begin{split}\hat{\mu} &= \frac{1}{R_{train}} \sum_{r_{ui} \in R_{train}} r_{ui}\\\\ \hat{\sigma} &= \sqrt{\sum_{r_{ui} \in R_{train}} \frac{(r_{ui}  \hat{\mu})^2}{R_{train}}}\end{split}\]

class
surprise.prediction_algorithms.baseline_only.
BaselineOnly
(bsl_options={})¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
Algorithm predicting the baseline estimate for given user and item.
\(\hat{r}_{ui} = b_{ui} = \mu + b_u + b_i\)
If user \(u\) is unknown, then the bias \(b_u\) is assumed to be zero. The same applies for item \(i\) with \(b_i\).
See section 2.1 of [Kor10] for details.
Parameters: bsl_options (dict) – A dictionary of options for the baseline estimates computation. See Baselines estimates configuration for accepted options.
kNN inspired algorithms¶
These are algorithms that are directly derived from a basic nearest neighbors approach.
Note
For each of these algorithms, the actual number of neighbors that are
aggregated to compute an estimation is necessarily less than or equal to
\(k\). First, there might just not exist enough neighbors and second, the
sets \(N_i^k(u)\) and \(N_u^k(i)\) only include neighbors for which
the similarity measure is positive. It would make no sense to aggregate
ratings from users (or items) that are negatively correlated. For a given
prediction, the actual number of neighbors can be retrieved in the
'actual_k'
field of the details
dictionary of the prediction
.
You may want to read the User Guide
on how to configure the sim_options
parameter.

class
surprise.prediction_algorithms.knns.
KNNBasic
(k=40, min_k=1, sim_options={}, **kwargs)¶ Bases:
surprise.prediction_algorithms.knns.SymmetricAlgo
A basic collaborative filtering algorithm.
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot r_{vi}} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]or
\[\hat{r}_{ui} = \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot r_{uj}} {\sum\limits_{j \in N^k_u(j)} \text{sim}(i, j)}\]depending on the
user_based
field of thesim_options
parameter.Parameters:  k (int) – The (max) number of neighbors to take into account for
aggregation (see this note). Default is
40
.  min_k (int) – The minimum number of neighbors to take into account for
aggregation. If there are not enough neighbors, the prediction is
set the the global mean of all ratings. Default is
1
.  sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.
 k (int) – The (max) number of neighbors to take into account for
aggregation (see this note). Default is

class
surprise.prediction_algorithms.knns.
KNNWithMeans
(k=40, min_k=1, sim_options={}, **kwargs)¶ Bases:
surprise.prediction_algorithms.knns.SymmetricAlgo
A basic collaborative filtering algorithm, taking into account the mean ratings of each user.
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu_u + \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi}  \mu_v)} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]or
\[\hat{r}_{ui} = \mu_i + \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj}  \mu_j)} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]depending on the
user_based
field of thesim_options
parameter.Parameters:  k (int) – The (max) number of neighbors to take into account for
aggregation (see this note). Default is
40
.  min_k (int) – The minimum number of neighbors to take into account for
aggregation. If there are not enough neighbors, the neighbor
aggregation is set to zero (so the prediction ends up being
equivalent to the mean \(\mu_u\) or \(\mu_i\)). Default is
1
.  sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.
 k (int) – The (max) number of neighbors to take into account for
aggregation (see this note). Default is

class
surprise.prediction_algorithms.knns.
KNNBaseline
(k=40, min_k=1, sim_options={}, bsl_options={})¶ Bases:
surprise.prediction_algorithms.knns.SymmetricAlgo
A basic collaborative filtering algorithm taking into account a baseline rating.
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = b_{ui} + \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi}  b_{vi})} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]or
\[\hat{r}_{ui} = b_{ui} + \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj}  b_{uj})} {\sum\limits_{j \in N^k_u(j)} \text{sim}(i, j)}\]depending on the
user_based
field of thesim_options
parameter. For the best predictions, use thepearson_baseline
similarity measure.This algorithm corresponds to formula (3), section 2.2 of [Kor10].
Parameters:  k (int) – The (max) number of neighbors to take into account for
aggregation (see this note). Default is
40
.  min_k (int) – The minimum number of neighbors to take into account for
aggregation. If there are not enough neighbors, the neighbor
aggregation is set to zero (so the prediction ends up being
equivalent to the baseline). Default is
1
.  sim_options (dict) – A dictionary of options for the similarity
measure. See Similarity measure configuration for accepted
options. It is recommended to use the
pearson_baseline
similarity measure.  bsl_options (dict) – A dictionary of options for the baseline estimates computation. See Baselines estimates configuration for accepted options.
 k (int) – The (max) number of neighbors to take into account for
aggregation (see this note). Default is
Matrix Factorizationbased algorithms¶

class
surprise.prediction_algorithms.matrix_factorization.
SVD
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. When baselines are not used, this is equivalent to Probabilistic Matrix Factorization [SM08] (see note below).
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u\]If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\) and \(q_i\).
For details, see equation (5) from [KBV09]. See also [RRSK10], section 5.3.1.
To estimate all the unknown, we minimize the following regularized squared error:
\[\sum_{r_{ui} \in R_{train}} \left(r_{ui}  \hat{r}_{ui} \right)^2 + \lambda\left(b_i^2 + b_u^2 + q_i^2 + p_u^2\right)\]The minimization is performed by a very straightforward stochastic gradient descent:
\[\begin{split}b_u &\leftarrow b_u &+ \gamma (e_{ui}  \lambda b_u)\\ b_i &\leftarrow b_i &+ \gamma (e_{ui}  \lambda b_i)\\ p_u &\leftarrow p_u &+ \gamma (e_{ui} \cdot q_i  \lambda p_u)\\ q_i &\leftarrow q_i &+ \gamma (e_{ui} \cdot p_u  \lambda q_i)\end{split}\]where \(e_{ui} = r_{ui}  \hat{r}_{ui}\). These steps are performed over all the ratings of the trainset and repeated
n_epochs
times. Baselines are initialized to0
. User and item factors are randomly initialized according to a normal distribution, which can be tuned using theinit_mean
andinit_std_dev
parameters.You also have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to
0.005
and regularization terms are set to0.02
.Note
You can choose to use an unbiased version of this algorithm, simply predicting:
\[\hat{r}_{ui} = q_i^Tp_u\]This is equivalent to Probabilistic Matrix Factorization ([SM08], section 2) and can be achieved by setting the
biased
parameter toFalse
.Parameters:  n_factors – The number of factors. Default is
100
.  n_epochs – The number of iteration of the SGD procedure. Default is
20
.  biased (bool) – Whether to use baselines (or biases). See note above. Default is
True
.  init_mean – The mean of the normal distribution for factor vectors
initialization. Default is
0
.  init_std_dev – The standard deviation of the normal distribution for
factor vectors initialization. Default is
0.1
.  lr_all – The learning rate for all parameters. Default is
0.005
.  reg_all – The regularization term for all parameters. Default is
0.02
.  lr_bu – The learning rate for \(b_u\). Takes precedence over
lr_all
if set. Default isNone
.  lr_bi – The learning rate for \(b_i\). Takes precedence over
lr_all
if set. Default isNone
.  lr_pu – The learning rate for \(p_u\). Takes precedence over
lr_all
if set. Default isNone
.  lr_qi – The learning rate for \(q_i\). Takes precedence over
lr_all
if set. Default isNone
.  reg_bu – The regularization term for \(b_u\). Takes precedence
over
reg_all
if set. Default isNone
.  reg_bi – The regularization term for \(b_i\). Takes precedence
over
reg_all
if set. Default isNone
.  reg_pu – The regularization term for \(p_u\). Takes precedence
over
reg_all
if set. Default isNone
.  reg_qi – The regularization term for \(q_i\). Takes precedence
over
reg_all
if set. Default isNone
.  verbose – If
True
, prints the current epoch. Default isFalse
.
 n_factors – The number of factors. Default is

class
surprise.prediction_algorithms.matrix_factorization.
SVDpp
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
The SVD++ algorithm, an extension of
SVD
taking into account implicit ratings.The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + I_u^{\frac{1}{2}} \sum_{j \in I_u}y_j\right)\]Where the \(y_j\) terms are a new set of item factors that capture implicit ratings. Here, an implicit rating describes the fact that a user \(u\) rated an item \(j\), regardless of the rating value.
If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\), \(q_i\) and \(y_i\).
For details, see section 4 of [Kor08]. See also [RRSK10], section 5.3.1.
Just as for
SVD
, the parameters are learned using a SGD on the regularized squared error objective.Baselines are initialized to
0
. User and item factors are randomly initialized according to a normal distribution, which can be tuned using theinit_mean
andinit_std_dev
parameters.You have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to
0.005
and regularization terms are set to0.02
.Parameters:  n_factors – The number of factors. Default is
20
.  n_epochs – The number of iteration of the SGD procedure. Default is
20
.  init_mean – The mean of the normal distribution for factor vectors
initialization. Default is
0
.  init_std_dev – The standard deviation of the normal distribution for
factor vectors initialization. Default is
0.1
.  lr_all – The learning rate for all parameters. Default is
0.007
.  reg_all – The regularization term for all parameters. Default is
0.02
.  lr_bu – The learning rate for \(b_u\). Takes precedence over
lr_all
if set. Default isNone
.  lr_bi – The learning rate for \(b_i\). Takes precedence over
lr_all
if set. Default isNone
.  lr_pu – The learning rate for \(p_u\). Takes precedence over
lr_all
if set. Default isNone
.  lr_qi – The learning rate for \(q_i\). Takes precedence over
lr_all
if set. Default isNone
.  lr_yj – The learning rate for \(y_j\). Takes precedence over
lr_all
if set. Default isNone
.  reg_bu – The regularization term for \(b_u\). Takes precedence
over
reg_all
if set. Default isNone
.  reg_bi – The regularization term for \(b_i\). Takes precedence
over
reg_all
if set. Default isNone
.  reg_pu – The regularization term for \(p_u\). Takes precedence
over
reg_all
if set. Default isNone
.  reg_qi – The regularization term for \(q_i\). Takes precedence
over
reg_all
if set. Default isNone
.  reg_yj – The regularization term for \(y_j\). Takes precedence
over
reg_all
if set. Default isNone
.  verbose – If
True
, prints the current epoch. Default isFalse
.
 n_factors – The number of factors. Default is

class
surprise.prediction_algorithms.matrix_factorization.
NMF
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
A collaborative filtering algorithm based on Nonnegative Matrix Factorization.
This algorithm is very similar to
SVD
. The prediction \(\hat{r}_{ui}\) is set as:\[\hat{r}_{ui} = q_i^Tp_u,\]where user and item factors are kept positive. Our implementation follows that suggested in [LZXZ14], which is equivalent to [ZWFM96] in its nonregularized form. Both are direct applications of NMF for dense matrices [LS01].
The optimization procedure is a (regularized) stochastic gradient descent with a specific choice of step size that ensures nonnegativity of factors, provided that their initial values are also positive.
At each step of the SGD procedure, the factors \(f\) or user \(u\) and item \(i\) are updated as follows:
\[\begin{split}p_{uf} &\leftarrow p_{uf} &\cdot \frac{\sum_{i \in I_u} q_{if} \cdot r_{ui}}{\sum_{i \in I_u} q_{if} \cdot \hat{r_{ui}} + \lambda_u I_u p_{uf}}\\ q_{if} &\leftarrow q_{if} &\cdot \frac{\sum_{u \in U_i} p_{uf} \cdot r_{ui}}{\sum_{u \in U_i} p_{uf} \cdot \hat{r_{ui}} + \lambda_i U_i q_{if}}\\\end{split}\]where \(\lambda_u\) and \(\lambda_i\) are regularization parameters.
This algorithm is highly dependent on initial values. User and item factors are uniformly initialized between
init_low
andinit_high
. Change them at your own risks!A biased version is available by setting the
biased
parameter toTrue
. In this case, the prediction is set as\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u,\]still ensuring positive factors. Baselines are optimized in the same way as in the
SVD
algorithm. While yielding better accuracy, the biased version seems highly prone to overfitting so you may want to reduce the number of factors (or increase regularization).Parameters:  n_factors – The number of factors. Default is
15
.  n_epochs – The number of iteration of the SGD procedure. Default is
50
.  biased (bool) – Whether to use baselines (or biases). Default is
False
.  reg_pu – The regularization term for users \(\lambda_u\). Default is
0.06
.  reg_qi – The regularization term for items \(\lambda_i\). Default is
0.06
.  reg_bu – The regularization term for \(b_u\). Only relevant for
biased version. Default is
0.02
.  reg_bi – The regularization term for \(b_i\). Only relevant for
biased version. Default is
0.02
.  lr_bu – The learning rate for \(b_u\). Only relevant for biased
version. Default is
0.005
.  lr_bi – The learning rate for \(b_i\). Only relevant for biased
version. Default is
0.005
.  init_low – Lower bound for random initialization of factors. Must be
greater than
0
to ensure nonnegative factors. Default is0
.  init_high – Higher bound for random initialization of factors. Default
is
1
.  verbose – If
True
, prints the current epoch. Default isFalse
.
 n_factors – The number of factors. Default is
Slope One¶

class
surprise.prediction_algorithms.slope_one.
SlopeOne
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
A simple yet accurate collaborative filtering algorithm.
This is a straightforward implementation of the SlopeOne algorithm [LM07].
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu_u + \frac{1}{ R_i(u)} \sum\limits_{j \in R_i(u)} \text{dev}(i, j),\]where \(R_i(u)\) is the set of relevant items, i.e. the set of items \(j\) rated by \(u\) that also have at least one common user with \(i\). \(\text{dev}_(i, j)\) is defined as the average difference between the ratings of \(i\) and those of \(j\):
\[\text{dev}(i, j) = \frac{1}{ U_{ij}}\sum\limits_{u \in U_{ij}} r_{ui}  r_{uj}\]
Coclustering¶

class
surprise.prediction_algorithms.co_clustering.
CoClustering
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
A collaborative filtering algorithm based on coclustering.
This is a straightforward implementation of [GM05].
Basically, users and items are assigned some clusters \(C_u\), \(C_i\), and some coclusters \(C_{ui}\).
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \overline{C_{ui}} + (\mu_u  \overline{C_u}) + (\mu_i  \overline{C_i}),\]where \(\overline{C_{ui}}\) is the average rating of cocluster \(C_{ui}\), \(\overline{C_u}\) is the average rating of \(u\)‘s cluster, and \(\overline{C_i}\) is the average rating of \(i\)‘s cluster. If the user is unknown, the prediction is \(\hat{r}_{ui} = \mu_i\). If the item is unknown, the prediction is \(\hat{r}_{ui} = \mu_u\). If both the user and the item are unknown, the prediction is \(\hat{r}_{ui} = \mu\).
Clusters are assigned using a straightforward optimization method, much like kmeans.
Parameters:  n_cltr_u (int) – Number of user clusters. Default is
3
.  n_cltr_i (int) – Number of item clusters. Default is
3
.  n_epochs (int) – Number of iteration of the optimization loop. Default is
20
.  verbose (bool) – If True, the current epoch will be printed. Default is
False
.
 n_cltr_u (int) – Number of user clusters. Default is
similarities module¶
The similarities
module includes tools to
compute similarity metrics between users or items. You may need to refer to the
Notation standards, References page. See also the
Similarity measure configuration section of the User Guide.
Available similarity measures:
cosine 
Compute the cosine similarity between all pairs of users (or items). 
msd 
Compute the Mean Squared Difference similarity between all pairs of users (or items). 
pearson 
Compute the Pearson correlation coefficient between all pairs of users (or items). 
pearson_baseline 
Compute the (shrunk) Pearson correlation coefficient between all pairs of users (or items) using baselines for centering instead of means. 

surprise.similarities.
cosine
()¶ Compute the cosine similarity between all pairs of users (or items).
Only common users (or items) are taken into account. The cosine similarity is defined as:
\[\text{cosine_sim}(u, v) = \frac{ \sum\limits_{i \in I_{uv}} r_{ui} \cdot r_{vi}} {\sqrt{\sum\limits_{i \in I_{uv}} r_{ui}^2} \cdot \sqrt{\sum\limits_{i \in I_{uv}} r_{vi}^2} }\]or
\[\text{cosine_sim}(i, j) = \frac{ \sum\limits_{u \in U_{ij}} r_{ui} \cdot r_{uj}} {\sqrt{\sum\limits_{u \in U_{ij}} r_{ui}^2} \cdot \sqrt{\sum\limits_{u \in U_{ij}} r_{uj}^2} }\]depending on the
user_based
field ofsim_options
(see Similarity measure configuration).For details on cosine similarity, see on Wikipedia.

surprise.similarities.
msd
()¶ Compute the Mean Squared Difference similarity between all pairs of users (or items).
Only common users (or items) are taken into account. The Mean Squared Difference is defined as:
\[\text{msd}(u, v) = \frac{1}{I_{uv}} \cdot \sum\limits_{i \in I_{uv}} (r_{ui}  r_{vi})^2\]or
\[\text{msd}(i, j) = \frac{1}{U_{ij}} \cdot \sum\limits_{u \in U_{ij}} (r_{ui}  r_{uj})^2\]depending on the
user_based
field ofsim_options
(see Similarity measure configuration).The MSDsimilarity is then defined as:
\[\begin{split}\text{msd_sim}(u, v) &= \frac{1}{\text{msd}(u, v) + 1}\\ \text{msd_sim}(i, j) &= \frac{1}{\text{msd}(i, j) + 1}\end{split}\]The \(+ 1\) term is just here to avoid dividing by zero.
For details on MSD, see third definition on Wikipedia.

surprise.similarities.
pearson
()¶ Compute the Pearson correlation coefficient between all pairs of users (or items).
Only common users (or items) are taken into account. The Pearson correlation coefficient can be seen as a meancentered cosine similarity, and is defined as:
\[\text{pearson_sim}(u, v) = \frac{ \sum\limits_{i \in I_{uv}} (r_{ui}  \mu_u) \cdot (r_{vi}  \mu_{v})} {\sqrt{\sum\limits_{i \in I_{uv}} (r_{ui}  \mu_u)^2} \cdot \sqrt{\sum\limits_{i \in I_{uv}} (r_{vi}  \mu_{v})^2} }\]or
\[\text{pearson_sim}(i, j) = \frac{ \sum\limits_{u \in U_{ij}} (r_{ui}  \mu_i) \cdot (r_{uj}  \mu_{j})} {\sqrt{\sum\limits_{u \in U_{ij}} (r_{ui}  \mu_i)^2} \cdot \sqrt{\sum\limits_{u \in U_{ij}} (r_{uj}  \mu_{j})^2} }\]depending on the
user_based
field ofsim_options
(see Similarity measure configuration).Note: if there are no common users or items, similarity will be 0 (and not 1).
For details on Pearson coefficient, see Wikipedia.

surprise.similarities.
pearson_baseline
()¶ Compute the (shrunk) Pearson correlation coefficient between all pairs of users (or items) using baselines for centering instead of means.
The shrinkage parameter helps to avoid overfitting when only few ratings are available (see Similarity measure configuration).
The Pearsonbaseline correlation coefficient is defined as:
\[\text{pearson_baseline_sim}(u, v) = \hat{\rho}_{uv} = \frac{ \sum\limits_{i \in I_{uv}} (r_{ui}  b_{ui}) \cdot (r_{vi}  b_{vi})} {\sqrt{\sum\limits_{i \in I_{uv}} (r_{ui}  b_{ui})^2} \cdot \sqrt{\sum\limits_{i \in I_{uv}} (r_{vi}  b_{vi})^2}}\]or
\[\text{pearson_baseline_sim}(i, j) = \hat{\rho}_{ij} = \frac{ \sum\limits_{u \in U_{ij}} (r_{ui}  b_{ui}) \cdot (r_{uj}  b_{uj})} {\sqrt{\sum\limits_{u \in U_{ij}} (r_{ui}  b_{ui})^2} \cdot \sqrt{\sum\limits_{u \in U_{ij}} (r_{uj}  b_{uj})^2}}\]The shrunk Pearsonbaseline correlation coefficient is then defined as:
\[ \begin{align}\begin{aligned}\text{pearson_baseline_shrunk_sim}(u, v) &= \frac{I_{uv}  1} {I_{uv}  1 + \text{shrinkage}} \cdot \hat{\rho}_{uv}\\\text{pearson_baseline_shrunk_sim}(i, j) &= \frac{U_{ij}  1} {U_{ij}  1 + \text{shrinkage}} \cdot \hat{\rho}_{ij}\end{aligned}\end{align} \]Obviously, a shrinkage parameter of 0 amounts to no shrinkage at all.
Note: here again, if there are no common users/items, similarity will be 0 (and not 1).
Motivations for such a similarity measure can be found on the Recommender System Handbook, section 5.4.1.
accuracy module¶
The surprise.accuracy
module provides with tools for computing accuracy
metrics on a set of predictions.
Available accuracy metrics:
rmse 
Compute RMSE (Root Mean Squared Error). 
mae 
Compute MAE (Mean Absolute Error). 
fcp 
Compute FCP (Fraction of Concordant Pairs). 

surprise.accuracy.
fcp
(predictions, verbose=True)¶ Compute FCP (Fraction of Concordant Pairs).
Computed as described in paper Collaborative Filtering on Ordinal User Feedback by Koren and Sill, section 5.2.
Parameters:  predictions (
list
ofPrediction
) – A list of predictions, as returned by thetest()
method.  verbose – If True, will print computed value. Default is
True
.
Returns: The Fraction of Concordant Pairs.
Raises: ValueError
– Whenpredictions
is empty. predictions (

surprise.accuracy.
mae
(predictions, verbose=True)¶ Compute MAE (Mean Absolute Error).
\[\text{MAE} = \frac{1}{\hat{R}} \sum_{\hat{r}_{ui} \in \hat{R}}r_{ui}  \hat{r}_{ui}\]Parameters:  predictions (
list
ofPrediction
) – A list of predictions, as returned by thetest()
method.  verbose – If True, will print computed value. Default is
True
.
Returns: The Mean Absolute Error of predictions.
Raises: ValueError
– Whenpredictions
is empty. predictions (

surprise.accuracy.
rmse
(predictions, verbose=True)¶ Compute RMSE (Root Mean Squared Error).
\[\text{RMSE} = \sqrt{\frac{1}{\hat{R}} \sum_{\hat{r}_{ui} \in \hat{R}}(r_{ui}  \hat{r}_{ui})^2}.\]Parameters:  predictions (
list
ofPrediction
) – A list of predictions, as returned by thetest()
method.  verbose – If True, will print computed value. Default is
True
.
Returns: The Root Mean Squared Error of predictions.
Raises: ValueError
– Whenpredictions
is empty. predictions (
dataset module¶
the dataset
module defines some tools for managing datasets.
Users may use both builtin and userdefined datasets (see the Getting Started page for examples). Right now, three builtin datasets are available:
 The movielens100k dataset.
 The movielens1m dataset.
 The Jester dataset 2.
Builtin datasets can all be loaded (or downloaded if you haven’t already)
using the Dataset.load_builtin()
method. For each builtin dataset,
Surprise also provide predefined readers
which are useful if
you want to use a custom dataset that has the same format as a builtin one.
Summary:
Dataset.load_builtin 
Load a builtin dataset. 
Dataset.load_from_file 
Load a dataset from a (custom) file. 
Dataset.load_from_folds 
Load a dataset where folds (for crossvalidation) are predefined by some files. 
Dataset.folds 
Generator function to iterate over the folds of the Dataset. 
DatasetAutoFolds.split 
Split the dataset into folds for future crossvalidation. 
Reader 
The Reader class is used to parse a file containing ratings. 
Trainset 
A trainset contains all useful data that constitutes a training set. 

class
surprise.dataset.
Dataset
(reader)¶ Base class for loading datasets.
Note that you should never instantiate the
Dataset
class directly (same goes for its derived classes), but instead use one of the three available methods for loading datasets.
folds
()¶ Generator function to iterate over the folds of the Dataset.
See User Guide for usage.
Yields: tuple – Trainset
and testset of current fold.

classmethod
load_builtin
(name=u'ml100k')¶ Load a builtin dataset.
If the dataset has not already been loaded, it will be downloaded and saved. You will have to split your dataset using the
split
method. See an example in the User Guide.Parameters: name ( string
) – The name of the builtin dataset to load. Accepted values are ‘ml100k’, ‘ml1m’, and ‘jester’. Default is ‘ml100k’.Returns: A Dataset
object.Raises: ValueError
– If thename
parameter is incorrect.

classmethod
load_from_file
(file_path, reader)¶ Load a dataset from a (custom) file.
Use this if you want to use a custom dataset and all of the ratings are stored in one file. You will have to split your dataset using the
split
method. See an example in the User Guide.Parameters:  file_path (
string
) – The path to the file containing ratings.  reader (
Reader
) – A reader to read the file.
 file_path (

classmethod
load_from_folds
(folds_files, reader)¶ Load a dataset where folds (for crossvalidation) are predefined by some files.
The purpose of this method is to cover a common use case where a dataset is already split into predefined folds, such as the movielens100k dataset which defines files u1.base, u1.test, u2.base, u2.test, etc... It can also be used when you don’t want to perform crossvalidation but still want to specify your training and testing data (which comes down to 1fold crossvalidation anyway). See an example in the User Guide.
Parameters:  folds_files (
iterable
oftuples
) – The list of the folds. A fold is a tuple of the form(path_to_train_file, path_to_test_file)
.  reader (
Reader
) – A reader to read the files.
 folds_files (


class
surprise.dataset.
DatasetAutoFolds
(ratings_file=None, reader=None)¶ A derived class from
Dataset
for which folds (for crossvalidation) are not predefined. (Or for when there are no folds at all).
build_full_trainset
()¶ Do not split the dataset into folds and just return a trainset as is, built from the whole dataset.
User can then query for predictions, as shown in the User Guide.
Returns: The Trainset
.

split
(n_folds=5, shuffle=True)¶ Split the dataset into folds for future crossvalidation.
If you forget to call
split()
, the dataset will be automatically shuffled and split for 5folds crossvalidation.You can obtain repeatable splits over your all your experiments by seeding the RNG:
import random random.seed(my_seed) # call this before you call split!
Parameters:  n_folds (
int
) – The number of folds.  shuffle (
bool
) – Whether to shuffle ratings before splitting. IfFalse
, folds will always be the same each time the experiment is run. Default isTrue
.
 n_folds (


class
surprise.dataset.
Reader
(name=None, line_format=None, sep=None, rating_scale=(1, 5), skip_lines=0)¶ The Reader class is used to parse a file containing ratings.
Such a file is assumed to specify only one rating per line, and each line needs to respect the following structure:
user ; item ; rating ; [timestamp]
where the order of the fields and the separator (here ‘;’) may be arbitrarily defined (see below). brackets indicate that the timestamp field is optional.
Parameters:  name (
string
, optional) – If specified, a Reader for one of the builtin datasets is returned and any other parameter is ignored. Accepted values are ‘ml100k’, ‘ml1m’, and ‘jester’. Default isNone
.  line_format (
string
) – The fields names, in the order at which they are encountered on a line. Example:'item user rating'
.  sep (char) – the separator between fields. Example :
';'
.  rating_scale (
tuple
, optional) – The rating scale used for every rating. Default is(1, 5)
.  skip_lines (
int
, optional) – Number of lines to skip at the beginning of the file. Default is0
.
 name (

class
surprise.dataset.
Trainset
(ur, ir, n_users, n_items, n_ratings, rating_scale, offset, raw2inner_id_users, raw2inner_id_items)¶ A trainset contains all useful data that constitutes a training set.
It is used by the
train()
method of every prediction algorithm. You should not try to built such an object on your own but rather use theDataset.folds()
method or theDatasetAutoFolds.build_full_trainset()
method.
ur
¶ defaultdict
oflist
– The users ratings. This is a dictionary containing lists of tuples of the form(item_inner_id, rating)
. The keys are user inner ids.

ir
¶ defaultdict
oflist
– The items ratings. This is a dictionary containing lists of tuples of the form(user_inner_id, rating)
. The keys are item inner ids.

n_users
¶ Total number of users \(U\).

n_items
¶ Total number of items \(I\).

n_ratings
¶ Total number of ratings \(R_{train}\).

rating_scale
¶ tuple – The minimum and maximal rating of the rating scale.

global_mean
¶ The mean of all ratings \(\mu\).

all_items
()¶ Generator function to iterate over all items.
Yields: Inner id of items.

all_ratings
()¶ Generator function to iterate over all ratings.
Yields: A tuple (uid, iid, rating)
where ids are inner ids.

all_users
()¶ Generator function to iterate over all users.
Yields: Inner id of users.

build_anti_testset
()¶ Return a list of ratings that can be used as a testset in the
test()
method.The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is assumed to be equal to the mean of all ratings
global_mean
.

build_testset
()¶ Return a list of ratings that can be used as a testset in the
test()
method.The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the
all_ratings()
generator. This is useful in cases where you want to to test your algorithm on the trainset.

global_mean
Return the mean of all ratings.
It’s only computed once.

knows_item
(iid)¶ Indicate if the item is part of the trainset.
An item is part of the trainset if the item was rated at least once.
Parameters: iid (int) – The (inner) item id. See this note. Returns: True
if item is part of the trainset, elseFalse
.

knows_user
(uid)¶ Indicate if the user is part of the trainset.
A user is part of the trainset if the user has at least one rating.
Parameters: uid (int) – The (inner) user id. See this note. Returns: True
if user is part of the trainset, elseFalse
.

to_inner_iid
(riid)¶ Convert an item raw id to an inner id.
See this note.
Parameters: riid (str) – The item raw id. Returns: The item inner id. Return type: int Raises: ValueError
– When item is not part of the trainset.

to_inner_uid
(ruid)¶ Convert a user raw id to an inner id.
See this note.
Parameters: ruid (str) – The user raw id. Returns: The user inner id. Return type: int Raises: ValueError
– When user is not part of the trainset.

evaluate module¶
The evaluate
module defines the evaluate()
function and
GridSearch
class

class
surprise.evaluate.
GridSearch
(algo_class, param_grid, measures=[u'rmse', u'mae'], verbose=1)¶ The
GridSearch
class, used to evaluate the performance of an algorithm on various combinations of parameters, and extract the best combination. It is analogous to GridSearchCV from scikitlearn.See User Guide for usage.
Parameters:  algo_class (
AlgoBase
) – A class object of of the algorithm to evaluate.  param_grid (dict) – The dictionary has algo_class parameters as keys (string) and list of parameters as the desired values to try. All combinations will be evaluated with desired algorithm.
 measures (list of string) – The performance measures to compute. Allowed names are function
names as defined in the
accuracy
module. Default is['rmse', 'mae']
.  verbose (int) – Level of verbosity. If
0
, nothing is printed. If1
, accuracy measures for each parameters combination are printed, with combination values. If2
, folds accuracy values are also printed. Default is1
.

cv_results
¶ dict of arrays – A dict that contains all parameters and accuracy information for each combination. Can be imported into a pandas DataFrame.

best_estimator
¶ dict of AlgoBase – Using an accuracy measure as key, get the estimator that gave the best accuracy results for the chosen measure.

best_score
¶ dict of floats – Using an accuracy measure as key, get the best score achieved for that measure.

best_params
¶ dict of dicts – Using an accuracy measure as key, get the parameters combination that gave the best accuracy results for the chosen measure.

best_index
¶ dict of ints – Using an accuracy measure as key, get the index that can be used with cv_results_ that achieved the highest accuracy for that measure.
 algo_class (

surprise.evaluate.
evaluate
(algo, data, measures=[u'rmse', u'mae'], with_dump=False, dump_dir=None, verbose=1)¶ Evaluate the performance of the algorithm on given data.
Depending on the nature of the
data
parameter, it may or may not perform cross validation.Parameters:  algo (
AlgoBase
) – The algorithm to evaluate.  data (
Dataset
) – The dataset on which to evaluate the algorithm.  measures (list of string) – The performance measures to compute. Allowed
names are function names as defined in the
accuracy
module. Default is['rmse', 'mae']
.  with_dump (bool) – If True, the predictions and the algorithm will be
dumped for later further analysis at each fold (see FAQ). The file names will be set as:
'<date><algorithm name><fold number>'
. Default isFalse
.  dump_dir (str) – The directory where to dump to files. Default is
'~/.surprise_data/dumps/'
.  verbose (int) – Level of verbosity. If 0, nothing is printed. If 1 (default), accuracy measures for each folds are printed, with a final summary. If 2, every prediction is printed.
Returns: A dictionary containing measures as keys and lists as values. Each list contains one entry per fold.
 algo (
dump module¶
The dump
module defines the dump()
function.

surprise.dump.
dump
(file_name, predictions=None, algo=None, verbose=0)¶ A basic wrapper around Pickle to serialize a list of prediction and/or an algorithm on drive.
What is dumped is a dictionary with keys
'predictions'
and'algo'
.Parameters:  file_name (str) – The name (with full path) specifying where to dump the predictions.
 predictions (list of
Prediction
) – The predictions to dump.  algo (
Algorithm
, optional) – The algorithm to dump.  verbose (int) – Level of verbosity. If
1
, then a message indicates that the dumping went successfully. Default is0
.

surprise.dump.
load
(file_name)¶ A basic wrapper around Pickle to deserialize a list of prediction and/or an algorithm that were dumped on drive using
dump()
.Parameters: file_name (str) – The path of the file from which the algorithm is to be loaded Returns: A tuple (predictions, algo)
wherepredictions
is a list ofPrediction
objects andalgo
is anAlgorithm
object. Depending on what was dumped, some of these may beNone
.