OpenRec documentation¶
Contents¶
OpenRec is an open-source and modular library for neural network-inspired recommendation algorithms
modules package¶
Module¶
-
class
openrec.legacy.modules.
Module
(train=True, l2_reg=None, scope=None, reuse=False)¶ The module is the OpenRec abstraction for modules. A module may belong to one of the three categories, extractions, fusions, and interactions, depending on its functionality (Read [1] for details).
Parameters: - train (bool, optional) – An indicator for training or servining phase.
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
Notes
The module abstraction is used to construct recommenders. It should be extended by all module implementations. During initialization, functions
self._build_shared_graph
,self._build_training_graph
, andself._build_serving_graph
are called as follows.A module implementation should follow two steps below:
- Build computational graphs. Override
self._build_shared_graph()
,self._build_training_graph()
, and/orself._build_serving_graph()
functions to build training/serving computational graphs. - Define a loss and an output list. Define a loss (
self._loss
) to be included in training and an output list of Tensorflow tensors (self._outputs
).
References
[1] Yang, L., Bagdasaryan, E., Gruenstein, J., Hsieh, C., and Estrin, D., 2018, June. OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms. In Proceedings of WSDM‘18, February 5-9, 2018, Marina Del Rey, CA, USA. -
_build_serving_graph
()¶ Build serving-specific computational graphs (may be overridden).
Build shared computational graphs across training and serving (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
-
get_loss
()¶ Retrieve the training loss.
Returns: Training loss Return type: float or Tensor
-
get_outputs
()¶ Retrieve the output list of Tensorflow tensors.
Returns: An output list of Tensorflow tensors Return type: list
Extractions¶
Extraction¶
-
class
openrec.legacy.modules.extractions.
Extraction
(train=True, l2_reg=None, scope=None, reuse=False)¶ A direct inheritance of the Module.
Look Up¶
-
class
openrec.legacy.modules.extractions.
LookUp
(embed, ids=None, scope=None, reuse=False)¶ The LookUp module maps (embeds) input ids into fixed representations. The representations are not be updated during training. The module outputs a tensor with shape shape(ids) + [embedding dimensionality].
Parameters: - embed (numpy array) – Fixed embedding matrix.
- ids (Tensorflow tensor, optional) – List of ids to retrieve embeddings. If None, the whole embedding matrix is returned.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
Build shared computational graphs across training and serving (may be overridden).
Identity Mapping¶
-
class
openrec.legacy.modules.extractions.
IdentityMapping
(value, scope=None, reuse=False)¶ The IdentityMapping module executes an identity function.
Parameters: - value (Tensorflow tensor) – Input tensor
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
Build shared computational graphs across training and serving (may be overridden).
Latent Factor¶
-
class
openrec.legacy.modules.extractions.
LatentFactor
(shape, init='normal', ids=None, l2_reg=None, scope=None, reuse=False)¶ The LatentFactor module maps (embeds) input ids into latent representations. The module outputs a tensor with shape shape(ids) + [embedding dimensionality].
Parameters: - shape (list) – Shape of the embedding matrix, i.e. [number of unique ids, embedding dimensionality].
- init (str, optional) – Embedding initialization. ‘zero’ or ‘normal’ (default).
- ids (Tensorflow tensor, optionl) – List of ids to retrieve embeddings. If None, the whole embedding matrix is returned.
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
Build shared computational graphs across training and serving (may be overridden).
-
censor_l2_norm_op
(censor_id_list=None, max_norm=1)¶ Limit the norm of embeddings.
Parameters: - censor_id_list (list or Tensorflow tensor) – list of embeddings to censor (indexed by ids).
- max_norm (float, optional) – Maximum norm.
Returns: An operator for post-training execution.
Return type: Tensorflow operator
Multi Layer FC¶
-
class
openrec.legacy.modules.extractions.
MultiLayerFC
(in_tensor, dims, relu_in=False, relu_mid=True, relu_out=False, dropout_in=None, dropout_mid=None, dropout_out=None, bias_in=True, bias_mid=True, bias_out=True, batch_norm=False, train=True, l2_reg=None, scope=None, reuse=False)¶ The MultiLayerFC module implements multi-layer perceptrons with ReLU as non-linear activation functions. Each layer is often referred as a fully-connected layer.
Parameters: - in_tensor (Tensorflow tensor) – An input tensor with shape [*, feature dimensionality]
- dims (list) – Specify the feature size of each layer’s outputs. For example, setting dims=[512, 258, 128] to create three fully-connected layers with output shape [*, 512], [*, 256], and [*, 128], respectively.
- relu_in (bool, optional) – Whether or not to add ReLU to the input tensor.
- relu_mid (bool, optional) – Whether or not to add ReLU to the outputs of intermediate layers.
- relu_out (bool, optional) – Whether or not to add ReLU to the final output tensor.
- dropout_in (float, optional) – Dropout rate for the input tensor. If None, no dropout is used for the input tensor.
- dropout_mid (float, optional) – Dropout rate for the outputs of intermediate layers. If None, no dropout is used for the intermediate outputs.
- dropout_out (float, optional) – Dropout rate for the outputs of the final layer. If None, no dropout is used for the final outputs.
- bias_in (bool, optional) – Whether or not to add bias to the input tensor.
- bias_mid (bool, optional) – Whether or not to add bias to the outputs of intermediate layers.
- bias_out (bool, optional) – Whether or not to add bias to the final output tensor.
- batch_norm (bool, optional) – Whether or not to add batch normalization [1] to each layer’s outputs.
- train (bool, optionl) – An indicator for training or servining phase.
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
References
[1] Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456). Build shared computational graphs across training and serving (may be overridden).
SDAE¶
-
class
openrec.legacy.modules.extractions.
SDAE
(in_tensor, dims, dropout=None, l2_reconst=1.0, train=True, l2_reg=None, scope=None, reuse=False)¶ The SDAE module implements Stacked Denoising Autoencoders [bn]. It outputs SDAE’s bottleneck representations (i.e., the encoder outputs).
Parameters: - in_tensor (Tensorflow tensor) – An input tensor with shape [*, feature dimensionality]
- dims (list) – Specify the feature size of each encoding layer’s outputs. For example, setting dims=[512, 258, 128] to create an three-layer encoder with output shape [*, 512], [*, 256], and [*, 128], and a two-layer decoder with output shape [*, 256] and [*, 512].
- dropout (float, optional) – Dropout rate for the input tensor. If None, no dropout is used for the input tensor.
- l2_reconst (float, optional) – Weight for reconstruction loss.
- train (bool, optionl) – An indicator for training or servining phase.
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
References
[bn] Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. and Manzagol, P.A., 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec), pp.3371-3408. Build shared computational graphs across training and serving (may be overridden).
TemporalLatentFactor¶
-
class
openrec.legacy.modules.extractions.
TemporalLatentFactor
(shape, mlp_dims, ids, init='normal', mlp_pretrain=True, l2_reg=None, train=True, scope=None, reuse=False)¶ Build shared computational graphs across training and serving (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
-
forward_update_embeddings
(sess)¶ Retrieve update node.
-
pretrain_mlp_as_identity
(sess)¶
Fusions¶
Fusion¶
-
class
openrec.legacy.modules.fusions.
Fusion
(train=True, l2_reg=None, scope=None, reuse=False)¶ A direct inheritance of the Module.
Concat¶
-
class
openrec.legacy.modules.fusions.
Concat
(module_list, axis=1, scope=None, reuse=False)¶ The Concat module outputs the concatenation of the outputs from multiple modules.
Parameters: - module_list (list) – The list of modules.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
Build shared computational graphs across training and serving (may be overridden).
Average¶
-
class
openrec.legacy.modules.fusions.
Average
(module_list, weight=1.0, scope=None, reuse=False)¶ The Average module outputs the element-wise average of the outputs from multiple modules.
Parameters: - module_list (list) – The list of modules.
- weight (float) – A value elementwise multiplied to module outputs.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
Build shared computational graphs across training and serving (may be overridden).
Interactions¶
Interaction¶
-
class
openrec.legacy.modules.interactions.
Interaction
(train=True, l2_reg=None, scope=None, reuse=False)¶ A direct inheritance of the Module.
PairwiseLog¶
-
class
openrec.legacy.modules.interactions.
PairwiseLog
(user, item=None, item_bias=None, p_item=None, p_item_bias=None, n_item=None, n_item_bias=None, train=None, scope=None, reuse=False)¶ The PairwiseLog module minimizes the pairwise logarithm loss [bpr] as follows (regularization and bias terms are not included):
\[\min \sum_{(i, p, n)} -ln\sigma (u_i^T v_p - u_i^T v_n)\]where \(u_i\) denotes the representation for user \(i\); \(v_p\) and \(v_n\) denote representations for positive item \(p\) and negative item \(n\), respectively.
Parameters: - user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
- item (Tensorflow tensor, required for testing) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- item_bias (Tensorflow tensor, required for testing) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
- p_item (Tensorflow tensor, required for training) – Representations for positive items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- p_item_bias (Tensorflow tensor, required for training) – Biases for positive items involved in the interactions. Shape: [number of interactions, 1].
- n_item (Tensorflow tensor, required for training) – Representations for negative items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- n_item_bias (Tensorflow tensor, required for training) – Biases for negative items involved in the interactions. Shape: [number of interactions, 1].
- train (bool, optionl) – An indicator for training or serving phase.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
References
[bpr] Rendle, S., Freudenthaler, C., Gantner, Z. and Schmidt-Thieme, L., 2009, June. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 452-461). AUAI Press. -
_build_serving_graph
()¶ Build serving-specific computational graphs (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
PairwiseEuDist¶
-
class
openrec.legacy.modules.interactions.
PairwiseEuDist
(user, item=None, item_bias=None, p_item=None, p_item_bias=None, n_item=None, n_item_bias=None, weights=1.0, margin=1.0, train=None, scope=None, reuse=False)¶ The PairwiseEuDist module minimizes the weighted pairwise euclidean distance-based hinge loss [cml] as follows (regularization and bias terms are not included):
\[\min \sum_{(i, p, n)} w_{ip} [m + \lVert c(u_i)-c(v_p) \lVert^2 - \lVert c(u_i)-c(v_n) \lVert^2]_+\]where \(c(x) = \frac{x}{\max(\lVert x \lVert, 1.0)}\); \(u_i\) denotes the representation for user \(i\); \(v_p\) and \(v_n\) denote representations for positive item \(p\) and negative item \(n\), respectively.
Parameters: - user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
- item (Tensorflow tensor, required for testing) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- item_bias (Tensorflow tensor, required for testing) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
- p_item (Tensorflow tensor, required for training) – Representations for positive items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- p_item_bias (Tensorflow tensor, required for training) – Biases for positive items involved in the interactions. Shape: [number of interactions, 1].
- n_item (Tensorflow tensor, required for training) – Representations for negative items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- n_item_bias (Tensorflow tensor, required for training) – Biases for negative items involved in the interactions. Shape: [number of interactions, 1].
- weights (Tensorflow tensor, optional) – Weights \(w\). Shape: [number of interactions, 1].
- margin (float, optional) – Margin \(m\). Default to 1.0.
- train (bool, optionl) – An indicator for training or serving phase.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
References
[cml] Hsieh, C.K., Yang, L., Cui, Y., Lin, T.Y., Belongie, S. and Estrin, D., 2017, April. Collaborative metric learning. In Proceedings of the 26th International Conference on World Wide Web (pp. 193-201). International World Wide Web Conferences Steering Committee. -
_build_serving_graph
()¶ Build serving-specific computational graphs (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
PointwiseGeCE¶
-
class
openrec.legacy.modules.interactions.
PointwiseGeCE
(user, item, item_bias, l2_reg=None, labels=None, train=None, scope=None, reuse=False)¶ The PointwiseGeCE module minimizes the cross entropy classification loss with generalized dot product as logits. The generalized dot-product [ncf] between user representation \(u_i\) and item representation \(v_j\) is defined as:
\[h^T(u_i \odot v_j)\]where \(\odot\) denotes element-wise dot product of two vectors, and \(h\) denotes learnable model parameters.
Parameters: - user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
- item (Tensorflow tensor) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- item_bias (Tensorflow tensor) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
- labels (Tensorflow tensor, required for training.) – Groundtruth labels for the interactions. Shape [number of interactions, ].
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- train (bool, optionl) – An indicator for training or servining phase.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
References
[ncf] (1, 2, 3) He, X., Liao, L., Zhang, H., Nie, L., Hu, X. and Chua, T.S., 2017, April. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web (pp. 173-182). International World Wide Web Conferences Steering Committee. -
_build_serving_graph
()¶ Build serving-specific computational graphs (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
PointwiseGeMLPCE¶
-
class
openrec.legacy.modules.interactions.
PointwiseGeMLPCE
(user_mlp, user_ge, item_mlp, item_ge, item_bias, dims, labels=None, dropout=None, alpha=0.5, l2_reg=None, train=None, scope=None, reuse=False)¶ The PointwiseGeMLPCE module minimizes the cross entropy classification loss. The logits are calculated as follows [ncf] (Bias term is not included).
\[\alpha h^T(u_i^{ge} \odot v_j^{ge}) + (1 - \alpha)MLP([u_i^{mlp}, v_j^{mlp}])\]Parameters: - user_mlp (Tensorflow tensor) – \(u^{mlp}\) for users involved in the interactions. Shape: [number of interactions, dimensionality of \(u^{mlp}\)].
- user_ge (Tensorflow tensor) – \(u^{ge}\) for users involved in the interactions. Shape: [number of interactions, dimensionality of \(u^{ge}\)].
- item_mlp (Tensorflow tensor) – \(v^{mlp}\) for items involved in the interactions. Shape: [number of interactions, dimensionality of \(v^{mlp}\)].
- item_ge (Tensorflow tensor) – \(v^{ge}\) for items involved in the interactions. Shape: [number of interactions, dimensionality of \(v^{ge}\)].
- item_bias (Tensorflow tensor) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
- dims (Numpy array.) – Specify the size of the MLP (openrec.legacy.modules.extractions.MultiLayerFC).
- labels (Tensorflow tensor, required for training.) – Groundtruth labels for the interactions. Shape [number of interactions, ].
- dropout (float, optional.) – Dropout rate for MLP (intermediate layers only).
- alpha (float, optional.) – Value of \(\alpha\). Default to 0.5.
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- train (bool, optionl) – An indicator for training or servining phase.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
-
_build_serving_graph
()¶ Build serving-specific computational graphs (may be overridden).
Build shared computational graphs across training and serving (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
PointwiseMLPCE¶
-
class
openrec.legacy.modules.interactions.
PointwiseMLPCE
(user, item, dims, item_bias=None, extra=None, l2_reg=None, labels=None, dropout=None, train=None, batch_serving=True, scope=None, reuse=False)¶ The PointwiseMLPCE module minimizes the cross entropy classification loss with outputs of a Multi-Layer Perceptron (MLP) as logits. The inputs to the MLP are the concatenation between user and item representations [ncf].
Parameters: - user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
- item (Tensorflow tensor) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- dims (Numpy array.) – Specify the size of the MLP (openrec.legacy.modules.extractions.MultiLayerFC).
- item_bias (Tensorflow tensor, optional) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
- extra (Tensorflow tensor, optional) – Representations for context involved in the interactions. Shape: [number of interaction, dimensionality of context representations]
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- labels (Tensorflow tensor, required for training.) – Groundtruth labels for the interactions. Shape [number of interactions, ].
- dropout (float, optional) – Dropout rate for MLP (intermediate layers only).
- train (bool, optional) – An indicator for training or servining phase.
- batch_serving (bool, optional) – An indicator for batch serving / pointwise serving.
- scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
-
_build_serving_graph
()¶ Build serving-specific computational graphs (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
PointwiseMSE¶
-
class
openrec.legacy.modules.interactions.
PointwiseMSE
(user, item, item_bias, labels=None, a=1.0, b=1.0, sigmoid=False, train=True, batch_serving=True, scope=None, reuse=False)¶ The PointwiseMSE module minimizes the pointwise mean-squre-error [ctm] as follows (regularization terms are not included):
\[\min \sum_{ij}c_{ij}(r_{ij} - u_i^T v_j)^2\]where \(u_i\) and \(v_j\) are representations for user \(i\) and item \(j\) respectively; \(c_{ij}=a\) if \(r_{ij}=1\), otherwise \(c_{ij}=b\).
Parameters: - user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
- item (Tensorflow tensor) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
- item_bias (Tensorflow tensor) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
- labels (Tensorflow tensor, required for training) – Groundtruth labels for the interactions. Shape [number of interactions, ].
- a (float, optional) – The value of \(c_{ij}\) if \(r_{ij}=1\).
- b (float, optional) – The value of \(c_{ij}\) if \(r_{ij}=0\).
- sigmoid (bool, optional) – Normalize the dot products, i.e., sigmoid(\(u_i^T v_j\)).
- train (bool, optionl) – An indicator for training or servining phase.
- batch_serving (bool, optional) – If True, the model calculates scores for all users against all items, and returns scores with shape [len(user), len(item)]. Otherwise, it returns scores for specified user item pairs (require
len(user)==len(item)
). - scope (str, optional) – Scope for module variables.
- reuse (bool, optional) – Whether or not to reuse module variables.
References
[ctm] Wang, C. and Blei, D.M., 2011, August. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448-456). ACM. -
_build_serving_graph
()¶ Build serving-specific computational graphs (may be overridden).
-
_build_training_graph
()¶ Build training-specific computational graphs (may be overridden).
openrec.legacy.recommenders package¶
Recommender¶
-
class
openrec.legacy.recommenders.
Recommender
(batch_size, max_user, max_item, extra_interactions_funcs=[], extra_fusions_funcs=[], test_batch_size=None, l2_reg=None, opt='SGD', lr=None, init_dict=None, sess_config=None)¶ The Recommender is the OpenRec abstraction [1] for recommendation algorithms.
Parameters: - batch_size (int) – Training batch size. The structure of a training instance varies across recommenders.
- max_user (int) – Maximum number of users in the recommendation system.
- max_item (int) – Maximum number of items in the recommendation system.
- extra_interactions_funcs (list, optional) – List of functions to build extra interaction modules.
- extra_fusions_funcs (list, optional) – List of functions to build extra fusion modules.
- test_batch_size (int, optional) – Batch size for testing and serving. The structure of a testing/serving instance varies across recommenders.
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- opt ('SGD'(default) or 'Adam', optional) – Optimization algorithm, SGD: Stochastic Gradient Descent.
- init_dict (dict, optional) – Key-value pairs for initial parameter values.
- sess_config (tensorflow.ConfigProto(), optional) – Tensorflow session configuration.
Notes
The recommender abstraction defines the procedures to build a recommendation computational graph and exposes interfaces for training and evaluation. During training, for each batch, the
self.train
function should be called with abatch_data
input,recommender_instance.train(batch_data)
and during testing/serving, the serve function should be called with a batch_data input:
recommender_instance.serve(batch_data)
A recommender contains four major components: inputs, extractions, fusions, and interactions. The figure below shows the order of which each related function is called. The
train
parameter in each function is used to build different computational graphs for training and serving.A new recommender class should be inherent from the Recommender class. Follow the steps below to override corresponding functions. To make a recommender easily extensible, it is NOT recommended to override functions
self._build_inputs
,self._build_fusions
, andself._build_interactions
.- Define inputs. Override functions
self._build_user_inputs
,self._build_item_inputs
, andself._build_extra_inputs
to define inputs for users’, items’, and contextual data sources respectively. An input should be defined using the input function as follows.
self._add_input(name='input_name', dtype='float32', shape=data_shape, train=True)
- Define input mappings. Override the function
self._input_mappings
to feed a batch_data into the defined inputs. The mapping should be specified using a python dict where a key corresponds to an input object retrieved byself._get_input(input_name, train=train)
, and a value corresponds to abatch_data
value. - Define extraction modules. Override functions
self._build_user_extractions
,self._build_item_extractions
, andself._build_extra_extractions
to define extraction modules for users, items, and extra contexts respectively. Useself._add_module
to construct a module, andself._get_input
/self._get_module
to retrieve an existing input/module. - Define fusion modules. Override the function
self._build_default_fusions
to build fusion modules. Custom functions can also be used as long as they are included in the inputextra_fusions_funcs
list. Useself._add_module
to construct a module, andself._get_input
/self._get_module
to retrieve an existing input/module. - Define interaction modules. Override the fuction
build_default_interactions
to build interaction modules. Custom functions can also be used as long as they are included in the inputextra_interactions_funcs
list. Useself._add_module
to construct a module, andself._get_input
/self._get_module
to retrieve an existing input/module.
When (
train==False
), a variable namedself._scores
should be defined for user-item scores. Such a score is higher if an item should be ranked higher in the recommendation list.References
[1] Yang, L., Bagdasaryan, E., Gruenstein, J., Hsieh, C., and Estrin, D., 2018, June. OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms. In Proceedings of WSDM‘18, February 5-9, 2018, Marina Del Rey, CA, USA. -
_add_input
(name, dtype='float32', shape=None, train=True)¶ Add an input - overwrite if
name
exists.Parameters: - name (str) – The input name.
- dtype (str) – Data type: “float16”, “float32” (default), “float64”, “int8”, “int16”, “int32”, “int64”, “bool”, “string” or “none”.
- shape (list or tuple) – Input shape.
- train (bool) – Specify training or serving graph.
-
_add_module
(name, module, train_loss=None, train=True)¶ Add a module - overwrite if
name
exists.Parameters: - name (str) – Module name.
- module (Module) – Module instance.
- train_loss (bool, optional) – Whether or not to include the output loss in the training loss (Default: include losses from all modules).
- train (bool, optional) – Specify the computational graph (train/serving) to add the module.
-
_build_default_fusions
(train=True)¶ Build default fusion modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_default_interactions
(train=True)¶ Build default interaction modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_extra_extractions
(train=True)¶ Build extraction modules for contextual data sources (may be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_extra_inputs
(train=True)¶ Build inputs for contextual data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_extractions
(train=True)¶ Call sub-functions to build extractions (do NOT override).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_fusions
(train=True)¶ Call sub-functions to build fusions (do NOT override).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_inputs
(train=True)¶ Call sub-functions to build inputs (do NOT override).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_interactions
(train=True)¶ Call sub-functions to build interactions (do NOT override).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_extractions
(train=True)¶ Build extraction modules for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_inputs
(train=True)¶ Build inputs for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_optimizer
()¶ Build an optimizer for model training.
-
_build_post_training_graph
()¶ Build post-training graph (do NOT override).
-
_build_post_training_ops
()¶ Build post-training operators (may be overriden).
Returns: A list of Tensorflow operators. Return type: list
-
_build_serving_graph
()¶ Call sub-functions to build serving graph (do NOT override).
-
_build_training_graph
()¶ Call sub-functions to build training graph (do NOT override).
-
_build_user_extractions
(train=True)¶ Build extraction modules for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_user_inputs
(train=True)¶ Build inputs for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_get_input
(name, train=True)¶ Retrieve an input.
Parameters: - name (str) – Input name.
- train (bool) – Specify training or serving graph.
Returns: The input specified by the name and the
train
flag.Return type: Tensorflow placeholder
-
_get_module
(name, train=True)¶ Retrieve a module.
Parameters: - name (str) – The module name.
- train (bool) – Specify training or serving graph.
Returns: The module specified by the name and the
train
flag.Return type:
-
_grad_post_processing
(grad_var_list)¶ Post-process gradients before updating variables.
Parameters: grad_var_list (list) – A list of tuples (gradients, variable). Returns: A list of updated tuples (updated gradients, variables). Return type: list
-
_initialize
(init_dict)¶ Initialize model parameters (do NOT override).
Parameters: init_dict (dict) – Key-value pairs for initial parameter values.
-
_input
(dtype='float32', shape=None, name=None)¶ Define an input for the recommender.
Parameters: - dtype (str) – Data type: “float16”, “float32”, “float64”, “int8”, “int16”, “int32”, “int64”, “bool”, or “string”.
- shape (list or tuple) – Input shape.
- name (str) – Name of the input.
Returns: Defined tensorflow placeholder.
Return type: Tensorflow placeholder
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
compute_module_loss
(name, batch_data, train=True)¶ Compute the loss of a module, specified by the name and the train flag.
Parameters: - name (str) – The module name.
- batch_data (dict) – A batch of training or serving data.
- train (bool) – Specify the computational graph (train/serving) to compute loss.
Returns: The loss of the specified module.
Return type: Numpy array
-
compute_module_outputs
(name, batch_data, train=True)¶ Compute the outputs of a module, specified by the name and the train flag.
Parameters: - name (str) – The module name.
- batch_data (dict) – A batch of training or serving data.
- train (bool) – Specify the computational graph (train/serving) to compute outputs.
Returns: The outputs of the specified module.
Return type: A list of Numpy arrays
-
load
(load_dir)¶ Load a saved model from disk.
Parameters: load_str (str) – Path to the saved model.
-
save
(save_dir, step)¶ Save a trained model to disk.
Parameters: - save_str (str) – Path to save the model.
- step (int) – training step.
-
serve
(batch_data)¶ Evaluate the model with an input batch_data.
Parameters: batch_data (dict) – A batch of testing or serving data.
-
train
(batch_data)¶ Train the model with an input batch_data.
Parameters: batch_data (dict) – A batch of training data.
BPR¶
-
class
openrec.legacy.recommenders.
BPR
(batch_size, max_user, max_item, dim_embed, test_batch_size=None, l2_reg=None, opt='SGD', lr=None, init_dict=None, sess_config=None)¶ Pure Baysian Personalized Ranking (BPR) [1] based Recommender
Parameters: - batch_size (int) – Training batch size. Each training instance consists of an user, a positive item, and a negative item.
- max_user (int) – Maximum number of users in the recommendation system.
- max_item (int) – Maximum number of items in the recommendation system.
- dim_embed (int) – Dimensionality of the user/item embedding.
- test_batch_size (int, optional) – Batch size for testing and serving. Each testing/serving bacth consists of an user.
- l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
- opt ('SGD'(default) or 'Adam', optional) – Optimization algorithm, SGD: Stochastic Gradient Descent.
- lr (float, optional) – Initial learning rate.
- init_dict (dict, optional) – Key-value pairs for inital parameter values.
- sess_config (tensorflow.ConfigProto(), optional) – Tensorflow session configuration.
Notes
BPR recommender is trained on users’ implicit feedback signals (e.g., clicks and views). The items clicked or viewed are treated as positive items, and otherwise as negative items. The pure BPR recommender does not consider any other auxiliary signals.
References
[1] Rendle, S., Freudenthaler, C., Gantner, Z. and Schmidt-Thieme, L., 2009, June. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 452-461). AUAI Press. -
_build_default_interactions
(train=True)¶ Build default interaction modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_extractions
(train=True)¶ Build extraction modules for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_inputs
(train=True)¶ Build inputs for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_serving_graph
()¶ Call sub-functions to build serving graph (do NOT override).
-
_build_user_extractions
(train=True)¶ Build extraction modules for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_user_inputs
(train=True)¶ Build inputs for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
VisualBPR¶
-
class
openrec.legacy.recommenders.
VisualBPR
(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='Adam', sess_config=None)¶ -
_build_default_fusions
(train=True)¶ Build default fusion modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_extractions
(train=True)¶ Build extraction modules for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_inputs
(train=True)¶ Build inputs for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_grad_post_processing
(grad_var_list)¶ Post-process gradients before updating variables.
Parameters: grad_var_list (list) – A list of tuples (gradients, variable). Returns: A list of updated tuples (updated gradients, variables). Return type: list
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
CDL¶
-
class
openrec.legacy.recommenders.
CDL
(batch_size, max_user, max_item, dim_embed, item_f, dims, dropout=None, test_batch_size=None, item_serving_size=None, l2_reg=None, l2_reg_mlp=None, l2_reconst=None, opt='SGD', sess_config=None)¶ -
_build_default_fusions
(train=True)¶ Build default fusion modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_extractions
(train=True)¶ Build extraction modules for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_inputs
(train=True)¶ Build inputs for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
CML¶
-
class
openrec.legacy.recommenders.
CML
(batch_size, max_user, max_item, dim_embed, test_batch_size=None, l2_reg=None, opt='SGD', lr=None, init_dict=None, sess_config=None)¶ -
_build_interactions
(train=True)¶ Call sub-functions to build interactions (do NOT override).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_post_training_ops
()¶ Build post-training operators (may be overriden).
Returns: A list of Tensorflow operators. Return type: list
-
Visual CML¶
-
class
openrec.legacy.recommenders.
VisualCML
(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='Adam', sess_config=None)¶ -
_build_default_interactions
(train=True)¶ Build default interaction modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
ConcatVisualBPR¶
-
class
openrec.legacy.recommenders.
ConcatVisualBPR
(batch_size, max_user, max_item, dim_embed, dim_ve, item_f_source, item_serving_size=None, l2_reg=None, sess_config=None)¶ -
_build_default_fusions
(train=True)¶ Build default fusion modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_extractions
(train=True)¶ Build extraction modules for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_inputs
(train=True)¶ Build inputs for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_grad_post_processing
(grad_var_list)¶ Post-process gradients before updating variables.
Parameters: grad_var_list (list) – A list of tuples (gradients, variable). Returns: A list of updated tuples (updated gradients, variables). Return type: list
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
PMF¶
-
class
openrec.legacy.recommenders.
PMF
(batch_size, dim_embed, max_user, max_item, test_batch_size=None, l2_reg=None, opt='SGD', sess_config=None)¶ -
_build_default_interactions
(train=True)¶ Build default interaction modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_extra_inputs
(train=True)¶ Build inputs for contextual data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_extractions
(train=True)¶ Build extraction modules for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_inputs
(train=True)¶ Build inputs for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_serving_graph
()¶ Call sub-functions to build serving graph (do NOT override).
-
_build_user_extractions
(train=True)¶ Build extraction modules for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_user_inputs
(train=True)¶ Build inputs for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
User PMF¶
-
class
openrec.legacy.recommenders.
UserPMF
(batch_size, max_user, max_item, dim_embed, dims, user_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)¶ -
_build_default_fusions
(train=True)¶ Build default fusion modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_user_extractions
(train=True)¶ Build extraction modules for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_user_inputs
(train=True)¶ Build inputs for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
VisualPMF¶
-
class
openrec.legacy.recommenders.
VisualPMF
(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)¶ -
_build_default_fusions
(train=True)¶ Build default fusion modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_extractions
(train=True)¶ Build extraction modules for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_item_inputs
(train=True)¶ Build inputs for items’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
User Visual PMF¶
-
class
openrec.legacy.recommenders.
UserVisualPMF
(batch_size, max_user, max_item, dim_embed, dims_user, dims_item, user_f_source, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)¶ -
_build_default_fusions
(train=True)¶ Build default fusion modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_user_extractions
(train=True)¶ Build extraction modules for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_build_user_inputs
(train=True)¶ Build inputs for users’ data sources (should be overriden)
Parameters: train (bool) – An indicator for training or servining phase.
-
_input_mappings
(batch_data, train)¶ Define mappings from input training batch to defined inputs.
Parameters: - batch_data (dict) – A training batch.
- train (bool) – An indicator for training or servining phase.
Returns: The mapping where a key corresponds to an input object, and a value corresponds to a
batch_data
value.Return type: dict
-
VisualGMF¶
-
class
openrec.legacy.recommenders.
VisualGMF
(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)¶ -
_build_default_interactions
(train=True)¶ Build default interaction modules (may be overriden).
Parameters: train (bool) – An indicator for training or servining phase.
-
utils package¶
openrec.legacy.utils.dataset module¶
-
class
openrec.legacy.utils.dataset.
Dataset
(raw_data, max_user, max_item, name='dataset')¶ Bases:
object
The Dataset class stores a sequence of data points for training or evaluation.
Parameters: - raw_data (numpy structured array) – Input raw data.
- max_user (int) – Maximum number of users in the recommendation system.
- max_item (int) – Maximum number of items in the recommendation system.
- name (str) – Name of the dataset.
Notes
The Dataset class expects
raw_data
as a numpy structured array, where each row represents a data point and contains at least two keys:user_id
: the user involved in the interaction.item_id
: the item involved in the interaction.
raw_data
might contain other keys, such astimestamp
, andlocation
, etc. based on the use cases of different recommendation systems. An user should be uniquely and numerically indexed from 0 tototal_number_of_users - 1
. The items should be indexed likewise.-
max_item
()¶ Maximum number of items.
Returns: Maximum number of items. Return type: int
-
max_user
()¶ Maximum number of users.
Returns: Maximum number of users. Return type: int
-
shuffle
()¶ Shuffle the dataset entries.
openrec.legacy.utils.implicit_dataset module¶
-
class
openrec.legacy.utils.implicit_dataset.
ImplicitDataset
(raw_data, max_user, max_item, name='dataset')¶ Bases:
openrec.legacy.utils.dataset.Dataset
The ImplicitDataset class stores and parses a sequence of user implicit feedback for training or evaluation. It extends the functionality of the Dataset class.
Parameters: - raw_data (numpy structured array) – Input raw data. Other legacy formats (e.g., sparse matrix) are supported but not recommended.
- max_user (int) – Maximum number of users in the recommendation system.
- max_item (int) – Maximum number of items in the recommendation system.
- name (str) – Name of the dataset.
Notes
The ImplicitDataset class parses the input
raw_data
into structured dictionaries (consumed by samplers or model trainer). This class expectsraw_data
as a numpy structured array, where each row represents a data point and contains at least two keys:user_id
: the user involved in the interaction.item_id
: the item involved in the interaction.
raw_data
might contain other keys, such astimestamp
, andlocation
, etc. based on the use cases of different recommendation systems. An user should be uniquely and numerically indexed from 0 tototal_number_of_users - 1
. The items should be indexed likewise.-
contain_item
(item_id)¶ Check whether or not an item is involved in any interaction.
Parameters: item_id (int) – target item id. Returns: A boolean indicator Return type: bool
-
contain_user
(user_id)¶ Check whether or not an user is involved in any interaction.
Parameters: user_id (int) – target user id. Returns: A boolean indicator Return type: bool
-
get_interactions_by_item_gb_user
(item_id)¶ Retrieve the interactions (grouped by user ids) involve a specific item.
Parameters: item_id (int) – target item id. Returns: Users that have interacted with given item. Return type: list
-
get_interactions_by_user_gb_item
(user_id)¶ Retrieve the interactions (grouped by item ids) involve a specific user.
Parameters: user_id (int) – target user id. Returns: Items that have interacted with given user. Return type: list
-
get_unique_item_list
()¶ Retrieve a list of unique item ids.
Returns: A list of unique item ids. Return type: numpy array
-
get_unique_user_list
()¶ Retrieve a list of unique user ids.
Returns: A list of unique user ids. Return type: numpy array
-
unique_item_count
()¶ Number of unique items.
Returns: Number of unique items. Return type: int
-
unique_user_count
()¶ Number of unique users.
Returns: Number of unique users. Return type: int
implicit_model_trainer module¶
-
class
openrec.legacy.implicit_model_trainer.
ImplicitModelTrainer
(batch_size, test_batch_size, train_dataset, model, sampler, item_serving_size=None, eval_save_prefix=None)¶ Bases:
object
The ImplicitModelTrainer class implements logics for basic recommender training and evaluation using users’ implicit feedback.
Parameters: - batch_size (int) – Training batch size.
- test_batch_size (int) – Test/Evaluation batch size (number of users per testing batch).
- train_dataset (Dataset) – Dataset for model training.
- model (Recommender) – The target recommender.
- sampler (Sampler) – The sampler for model training.
- item_serving_size (int, optional) – Test/Evaluation batch size (number of items per testing batch).
Notes
The function
train
should be called for model training and evaluation.-
train
(num_itr, display_itr, eval_datasets=[], evaluators=[], num_negatives=None, seed=10)¶ Train and evaluate a recommender.
Parameters: - num_itr (int) – total number of training iterations.
- display_itr (int) – Evaluation/testing period.
- eval_datasets (list of Dataset) – A list of datasets for evaluation/testing.
- evaluators (list of Evaluator) – A list of evaluators for evaluation/testing.
- num_negatives (int, optional) – If specified, a given number of items NOT interacted with each user will be sampled (as negative items) for evaluations.