OpenRec documentation

Contents

OpenRec is an open-source and modular library for neural network-inspired recommendation algorithms

modules package

Module

class openrec.legacy.modules.Module(train=True, l2_reg=None, scope=None, reuse=False)

The module is the OpenRec abstraction for modules. A module may belong to one of the three categories, extractions, fusions, and interactions, depending on its functionality (Read [1] for details).

Parameters:
  • train (bool, optional) – An indicator for training or servining phase.
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.

Notes

The module abstraction is used to construct recommenders. It should be extended by all module implementations. During initialization, functions self._build_shared_graph, self._build_training_graph, and self._build_serving_graph are called as follows.

The structure of the module abstraction

A module implementation should follow two steps below:

  • Build computational graphs. Override self._build_shared_graph(), self._build_training_graph(), and/or self._build_serving_graph() functions to build training/serving computational graphs.
  • Define a loss and an output list. Define a loss (self._loss) to be included in training and an output list of Tensorflow tensors (self._outputs).

References

[1]Yang, L., Bagdasaryan, E., Gruenstein, J., Hsieh, C., and Estrin, D., 2018, June. OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms. In Proceedings of WSDM‘18, February 5-9, 2018, Marina Del Rey, CA, USA.
_build_serving_graph()

Build serving-specific computational graphs (may be overridden).

_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

get_loss()

Retrieve the training loss.

Returns:Training loss
Return type:float or Tensor
get_outputs()

Retrieve the output list of Tensorflow tensors.

Returns:An output list of Tensorflow tensors
Return type:list

Extractions

Extraction

class openrec.legacy.modules.extractions.Extraction(train=True, l2_reg=None, scope=None, reuse=False)

A direct inheritance of the Module.

Look Up

class openrec.legacy.modules.extractions.LookUp(embed, ids=None, scope=None, reuse=False)

The LookUp module maps (embeds) input ids into fixed representations. The representations are not be updated during training. The module outputs a tensor with shape shape(ids) + [embedding dimensionality].

Parameters:
  • embed (numpy array) – Fixed embedding matrix.
  • ids (Tensorflow tensor, optional) – List of ids to retrieve embeddings. If None, the whole embedding matrix is returned.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

Identity Mapping

class openrec.legacy.modules.extractions.IdentityMapping(value, scope=None, reuse=False)

The IdentityMapping module executes an identity function.

Parameters:
  • value (Tensorflow tensor) – Input tensor
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

Latent Factor

class openrec.legacy.modules.extractions.LatentFactor(shape, init='normal', ids=None, l2_reg=None, scope=None, reuse=False)

The LatentFactor module maps (embeds) input ids into latent representations. The module outputs a tensor with shape shape(ids) + [embedding dimensionality].

Parameters:
  • shape (list) – Shape of the embedding matrix, i.e. [number of unique ids, embedding dimensionality].
  • init (str, optional) – Embedding initialization. ‘zero’ or ‘normal’ (default).
  • ids (Tensorflow tensor, optionl) – List of ids to retrieve embeddings. If None, the whole embedding matrix is returned.
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

censor_l2_norm_op(censor_id_list=None, max_norm=1)

Limit the norm of embeddings.

Parameters:
  • censor_id_list (list or Tensorflow tensor) – list of embeddings to censor (indexed by ids).
  • max_norm (float, optional) – Maximum norm.
Returns:

An operator for post-training execution.

Return type:

Tensorflow operator

Multi Layer FC

class openrec.legacy.modules.extractions.MultiLayerFC(in_tensor, dims, relu_in=False, relu_mid=True, relu_out=False, dropout_in=None, dropout_mid=None, dropout_out=None, bias_in=True, bias_mid=True, bias_out=True, batch_norm=False, train=True, l2_reg=None, scope=None, reuse=False)

The MultiLayerFC module implements multi-layer perceptrons with ReLU as non-linear activation functions. Each layer is often referred as a fully-connected layer.

Parameters:
  • in_tensor (Tensorflow tensor) – An input tensor with shape [*, feature dimensionality]
  • dims (list) – Specify the feature size of each layer’s outputs. For example, setting dims=[512, 258, 128] to create three fully-connected layers with output shape [*, 512], [*, 256], and [*, 128], respectively.
  • relu_in (bool, optional) – Whether or not to add ReLU to the input tensor.
  • relu_mid (bool, optional) – Whether or not to add ReLU to the outputs of intermediate layers.
  • relu_out (bool, optional) – Whether or not to add ReLU to the final output tensor.
  • dropout_in (float, optional) – Dropout rate for the input tensor. If None, no dropout is used for the input tensor.
  • dropout_mid (float, optional) – Dropout rate for the outputs of intermediate layers. If None, no dropout is used for the intermediate outputs.
  • dropout_out (float, optional) – Dropout rate for the outputs of the final layer. If None, no dropout is used for the final outputs.
  • bias_in (bool, optional) – Whether or not to add bias to the input tensor.
  • bias_mid (bool, optional) – Whether or not to add bias to the outputs of intermediate layers.
  • bias_out (bool, optional) – Whether or not to add bias to the final output tensor.
  • batch_norm (bool, optional) – Whether or not to add batch normalization [1] to each layer’s outputs.
  • train (bool, optionl) – An indicator for training or servining phase.
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.

References

[1]Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

SDAE

class openrec.legacy.modules.extractions.SDAE(in_tensor, dims, dropout=None, l2_reconst=1.0, train=True, l2_reg=None, scope=None, reuse=False)

The SDAE module implements Stacked Denoising Autoencoders [bn]. It outputs SDAE’s bottleneck representations (i.e., the encoder outputs).

Parameters:
  • in_tensor (Tensorflow tensor) – An input tensor with shape [*, feature dimensionality]
  • dims (list) – Specify the feature size of each encoding layer’s outputs. For example, setting dims=[512, 258, 128] to create an three-layer encoder with output shape [*, 512], [*, 256], and [*, 128], and a two-layer decoder with output shape [*, 256] and [*, 512].
  • dropout (float, optional) – Dropout rate for the input tensor. If None, no dropout is used for the input tensor.
  • l2_reconst (float, optional) – Weight for reconstruction loss.
  • train (bool, optionl) – An indicator for training or servining phase.
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.

References

[bn]Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. and Manzagol, P.A., 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec), pp.3371-3408.
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

TemporalLatentFactor

class openrec.legacy.modules.extractions.TemporalLatentFactor(shape, mlp_dims, ids, init='normal', mlp_pretrain=True, l2_reg=None, train=True, scope=None, reuse=False)
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

forward_update_embeddings(sess)

Retrieve update node.

pretrain_mlp_as_identity(sess)

Fusions

Fusion

class openrec.legacy.modules.fusions.Fusion(train=True, l2_reg=None, scope=None, reuse=False)

A direct inheritance of the Module.

Concat

class openrec.legacy.modules.fusions.Concat(module_list, axis=1, scope=None, reuse=False)

The Concat module outputs the concatenation of the outputs from multiple modules.

Parameters:
  • module_list (list) – The list of modules.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

Average

class openrec.legacy.modules.fusions.Average(module_list, weight=1.0, scope=None, reuse=False)

The Average module outputs the element-wise average of the outputs from multiple modules.

Parameters:
  • module_list (list) – The list of modules.
  • weight (float) – A value elementwise multiplied to module outputs.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.
_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

Interactions

Interaction

class openrec.legacy.modules.interactions.Interaction(train=True, l2_reg=None, scope=None, reuse=False)

A direct inheritance of the Module.

PairwiseLog

class openrec.legacy.modules.interactions.PairwiseLog(user, item=None, item_bias=None, p_item=None, p_item_bias=None, n_item=None, n_item_bias=None, train=None, scope=None, reuse=False)

The PairwiseLog module minimizes the pairwise logarithm loss [bpr] as follows (regularization and bias terms are not included):

\[\min \sum_{(i, p, n)} -ln\sigma (u_i^T v_p - u_i^T v_n)\]

where \(u_i\) denotes the representation for user \(i\); \(v_p\) and \(v_n\) denote representations for positive item \(p\) and negative item \(n\), respectively.

Parameters:
  • user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
  • item (Tensorflow tensor, required for testing) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • item_bias (Tensorflow tensor, required for testing) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
  • p_item (Tensorflow tensor, required for training) – Representations for positive items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • p_item_bias (Tensorflow tensor, required for training) – Biases for positive items involved in the interactions. Shape: [number of interactions, 1].
  • n_item (Tensorflow tensor, required for training) – Representations for negative items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • n_item_bias (Tensorflow tensor, required for training) – Biases for negative items involved in the interactions. Shape: [number of interactions, 1].
  • train (bool, optionl) – An indicator for training or serving phase.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.

References

[bpr]Rendle, S., Freudenthaler, C., Gantner, Z. and Schmidt-Thieme, L., 2009, June. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 452-461). AUAI Press.
_build_serving_graph()

Build serving-specific computational graphs (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

PairwiseEuDist

class openrec.legacy.modules.interactions.PairwiseEuDist(user, item=None, item_bias=None, p_item=None, p_item_bias=None, n_item=None, n_item_bias=None, weights=1.0, margin=1.0, train=None, scope=None, reuse=False)

The PairwiseEuDist module minimizes the weighted pairwise euclidean distance-based hinge loss [cml] as follows (regularization and bias terms are not included):

\[\min \sum_{(i, p, n)} w_{ip} [m + \lVert c(u_i)-c(v_p) \lVert^2 - \lVert c(u_i)-c(v_n) \lVert^2]_+\]

where \(c(x) = \frac{x}{\max(\lVert x \lVert, 1.0)}\); \(u_i\) denotes the representation for user \(i\); \(v_p\) and \(v_n\) denote representations for positive item \(p\) and negative item \(n\), respectively.

Parameters:
  • user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
  • item (Tensorflow tensor, required for testing) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • item_bias (Tensorflow tensor, required for testing) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
  • p_item (Tensorflow tensor, required for training) – Representations for positive items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • p_item_bias (Tensorflow tensor, required for training) – Biases for positive items involved in the interactions. Shape: [number of interactions, 1].
  • n_item (Tensorflow tensor, required for training) – Representations for negative items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • n_item_bias (Tensorflow tensor, required for training) – Biases for negative items involved in the interactions. Shape: [number of interactions, 1].
  • weights (Tensorflow tensor, optional) – Weights \(w\). Shape: [number of interactions, 1].
  • margin (float, optional) – Margin \(m\). Default to 1.0.
  • train (bool, optionl) – An indicator for training or serving phase.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.

References

[cml]Hsieh, C.K., Yang, L., Cui, Y., Lin, T.Y., Belongie, S. and Estrin, D., 2017, April. Collaborative metric learning. In Proceedings of the 26th International Conference on World Wide Web (pp. 193-201). International World Wide Web Conferences Steering Committee.
_build_serving_graph()

Build serving-specific computational graphs (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

PointwiseGeCE

class openrec.legacy.modules.interactions.PointwiseGeCE(user, item, item_bias, l2_reg=None, labels=None, train=None, scope=None, reuse=False)

The PointwiseGeCE module minimizes the cross entropy classification loss with generalized dot product as logits. The generalized dot-product [ncf] between user representation \(u_i\) and item representation \(v_j\) is defined as:

\[h^T(u_i \odot v_j)\]

where \(\odot\) denotes element-wise dot product of two vectors, and \(h\) denotes learnable model parameters.

Parameters:
  • user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
  • item (Tensorflow tensor) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • item_bias (Tensorflow tensor) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
  • labels (Tensorflow tensor, required for training.) – Groundtruth labels for the interactions. Shape [number of interactions, ].
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • train (bool, optionl) – An indicator for training or servining phase.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.

References

[ncf](1, 2, 3) He, X., Liao, L., Zhang, H., Nie, L., Hu, X. and Chua, T.S., 2017, April. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web (pp. 173-182). International World Wide Web Conferences Steering Committee.
_build_serving_graph()

Build serving-specific computational graphs (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

PointwiseGeMLPCE

class openrec.legacy.modules.interactions.PointwiseGeMLPCE(user_mlp, user_ge, item_mlp, item_ge, item_bias, dims, labels=None, dropout=None, alpha=0.5, l2_reg=None, train=None, scope=None, reuse=False)

The PointwiseGeMLPCE module minimizes the cross entropy classification loss. The logits are calculated as follows [ncf] (Bias term is not included).

\[\alpha h^T(u_i^{ge} \odot v_j^{ge}) + (1 - \alpha)MLP([u_i^{mlp}, v_j^{mlp}])\]
Parameters:
  • user_mlp (Tensorflow tensor) – \(u^{mlp}\) for users involved in the interactions. Shape: [number of interactions, dimensionality of \(u^{mlp}\)].
  • user_ge (Tensorflow tensor) – \(u^{ge}\) for users involved in the interactions. Shape: [number of interactions, dimensionality of \(u^{ge}\)].
  • item_mlp (Tensorflow tensor) – \(v^{mlp}\) for items involved in the interactions. Shape: [number of interactions, dimensionality of \(v^{mlp}\)].
  • item_ge (Tensorflow tensor) – \(v^{ge}\) for items involved in the interactions. Shape: [number of interactions, dimensionality of \(v^{ge}\)].
  • item_bias (Tensorflow tensor) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
  • dims (Numpy array.) – Specify the size of the MLP (openrec.legacy.modules.extractions.MultiLayerFC).
  • labels (Tensorflow tensor, required for training.) – Groundtruth labels for the interactions. Shape [number of interactions, ].
  • dropout (float, optional.) – Dropout rate for MLP (intermediate layers only).
  • alpha (float, optional.) – Value of \(\alpha\). Default to 0.5.
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • train (bool, optionl) – An indicator for training or servining phase.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.
_build_serving_graph()

Build serving-specific computational graphs (may be overridden).

_build_shared_graph()

Build shared computational graphs across training and serving (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

PointwiseMLPCE

class openrec.legacy.modules.interactions.PointwiseMLPCE(user, item, dims, item_bias=None, extra=None, l2_reg=None, labels=None, dropout=None, train=None, batch_serving=True, scope=None, reuse=False)

The PointwiseMLPCE module minimizes the cross entropy classification loss with outputs of a Multi-Layer Perceptron (MLP) as logits. The inputs to the MLP are the concatenation between user and item representations [ncf].

Parameters:
  • user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
  • item (Tensorflow tensor) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • dims (Numpy array.) – Specify the size of the MLP (openrec.legacy.modules.extractions.MultiLayerFC).
  • item_bias (Tensorflow tensor, optional) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
  • extra (Tensorflow tensor, optional) – Representations for context involved in the interactions. Shape: [number of interaction, dimensionality of context representations]
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • labels (Tensorflow tensor, required for training.) – Groundtruth labels for the interactions. Shape [number of interactions, ].
  • dropout (float, optional) – Dropout rate for MLP (intermediate layers only).
  • train (bool, optional) – An indicator for training or servining phase.
  • batch_serving (bool, optional) – An indicator for batch serving / pointwise serving.
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.
_build_serving_graph()

Build serving-specific computational graphs (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

PointwiseMSE

class openrec.legacy.modules.interactions.PointwiseMSE(user, item, item_bias, labels=None, a=1.0, b=1.0, sigmoid=False, train=True, batch_serving=True, scope=None, reuse=False)

The PointwiseMSE module minimizes the pointwise mean-squre-error [ctm] as follows (regularization terms are not included):

\[\min \sum_{ij}c_{ij}(r_{ij} - u_i^T v_j)^2\]

where \(u_i\) and \(v_j\) are representations for user \(i\) and item \(j\) respectively; \(c_{ij}=a\) if \(r_{ij}=1\), otherwise \(c_{ij}=b\).

Parameters:
  • user (Tensorflow tensor) – Representations for users involved in the interactions. Shape: [number of interactions, dimensionality of user representations].
  • item (Tensorflow tensor) – Representations for items involved in the interactions. Shape: [number of interactions, dimensionality of item representations].
  • item_bias (Tensorflow tensor) – Biases for items involved in the interactions. Shape: [number of interactions, 1].
  • labels (Tensorflow tensor, required for training) – Groundtruth labels for the interactions. Shape [number of interactions, ].
  • a (float, optional) – The value of \(c_{ij}\) if \(r_{ij}=1\).
  • b (float, optional) – The value of \(c_{ij}\) if \(r_{ij}=0\).
  • sigmoid (bool, optional) – Normalize the dot products, i.e., sigmoid(\(u_i^T v_j\)).
  • train (bool, optionl) – An indicator for training or servining phase.
  • batch_serving (bool, optional) – If True, the model calculates scores for all users against all items, and returns scores with shape [len(user), len(item)]. Otherwise, it returns scores for specified user item pairs (require len(user)==len(item)).
  • scope (str, optional) – Scope for module variables.
  • reuse (bool, optional) – Whether or not to reuse module variables.

References

[ctm]Wang, C. and Blei, D.M., 2011, August. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448-456). ACM.
_build_serving_graph()

Build serving-specific computational graphs (may be overridden).

_build_training_graph()

Build training-specific computational graphs (may be overridden).

openrec.legacy.recommenders package

Recommender

class openrec.legacy.recommenders.Recommender(batch_size, max_user, max_item, extra_interactions_funcs=[], extra_fusions_funcs=[], test_batch_size=None, l2_reg=None, opt='SGD', lr=None, init_dict=None, sess_config=None)

The Recommender is the OpenRec abstraction [1] for recommendation algorithms.

Parameters:
  • batch_size (int) – Training batch size. The structure of a training instance varies across recommenders.
  • max_user (int) – Maximum number of users in the recommendation system.
  • max_item (int) – Maximum number of items in the recommendation system.
  • extra_interactions_funcs (list, optional) – List of functions to build extra interaction modules.
  • extra_fusions_funcs (list, optional) – List of functions to build extra fusion modules.
  • test_batch_size (int, optional) – Batch size for testing and serving. The structure of a testing/serving instance varies across recommenders.
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • opt ('SGD'(default) or 'Adam', optional) – Optimization algorithm, SGD: Stochastic Gradient Descent.
  • init_dict (dict, optional) – Key-value pairs for initial parameter values.
  • sess_config (tensorflow.ConfigProto(), optional) – Tensorflow session configuration.

Notes

The recommender abstraction defines the procedures to build a recommendation computational graph and exposes interfaces for training and evaluation. During training, for each batch, the self.train function should be called with a batch_data input,

recommender_instance.train(batch_data)

and during testing/serving, the serve function should be called with a batch_data input:

recommender_instance.serve(batch_data)

A recommender contains four major components: inputs, extractions, fusions, and interactions. The figure below shows the order of which each related function is called. The train parameter in each function is used to build different computational graphs for training and serving.

The structure of the recommender abstraction

A new recommender class should be inherent from the Recommender class. Follow the steps below to override corresponding functions. To make a recommender easily extensible, it is NOT recommended to override functions self._build_inputs, self._build_fusions, and self._build_interactions.

  • Define inputs. Override functions self._build_user_inputs, self._build_item_inputs, and self._build_extra_inputs to define inputs for users’, items’, and contextual data sources respectively. An input should be defined using the input function as follows.
self._add_input(name='input_name', dtype='float32', shape=data_shape, train=True)
  • Define input mappings. Override the function self._input_mappings to feed a batch_data into the defined inputs. The mapping should be specified using a python dict where a key corresponds to an input object retrieved by self._get_input(input_name, train=train), and a value corresponds to a batch_data value.
  • Define extraction modules. Override functions self._build_user_extractions, self._build_item_extractions, and self._build_extra_extractions to define extraction modules for users, items, and extra contexts respectively. Use self._add_module to construct a module, and self._get_input/self._get_module to retrieve an existing input/module.
  • Define fusion modules. Override the function self._build_default_fusions to build fusion modules. Custom functions can also be used as long as they are included in the input extra_fusions_funcs list. Use self._add_module to construct a module, and self._get_input/self._get_module to retrieve an existing input/module.
  • Define interaction modules. Override the fuction build_default_interactions to build interaction modules. Custom functions can also be used as long as they are included in the input extra_interactions_funcs list. Use self._add_module to construct a module, and self._get_input/self._get_module to retrieve an existing input/module.

When (train==False), a variable named self._scores should be defined for user-item scores. Such a score is higher if an item should be ranked higher in the recommendation list.

References

[1]Yang, L., Bagdasaryan, E., Gruenstein, J., Hsieh, C., and Estrin, D., 2018, June. OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms. In Proceedings of WSDM‘18, February 5-9, 2018, Marina Del Rey, CA, USA.
_add_input(name, dtype='float32', shape=None, train=True)

Add an input - overwrite if name exists.

Parameters:
  • name (str) – The input name.
  • dtype (str) – Data type: “float16”, “float32” (default), “float64”, “int8”, “int16”, “int32”, “int64”, “bool”, “string” or “none”.
  • shape (list or tuple) – Input shape.
  • train (bool) – Specify training or serving graph.
_add_module(name, module, train_loss=None, train=True)

Add a module - overwrite if name exists.

Parameters:
  • name (str) – Module name.
  • module (Module) – Module instance.
  • train_loss (bool, optional) – Whether or not to include the output loss in the training loss (Default: include losses from all modules).
  • train (bool, optional) – Specify the computational graph (train/serving) to add the module.
_build_default_fusions(train=True)

Build default fusion modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_default_interactions(train=True)

Build default interaction modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_extra_extractions(train=True)

Build extraction modules for contextual data sources (may be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_extra_inputs(train=True)

Build inputs for contextual data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_extractions(train=True)

Call sub-functions to build extractions (do NOT override).

Parameters:train (bool) – An indicator for training or servining phase.
_build_fusions(train=True)

Call sub-functions to build fusions (do NOT override).

Parameters:train (bool) – An indicator for training or servining phase.
_build_inputs(train=True)

Call sub-functions to build inputs (do NOT override).

Parameters:train (bool) – An indicator for training or servining phase.
_build_interactions(train=True)

Call sub-functions to build interactions (do NOT override).

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_extractions(train=True)

Build extraction modules for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_inputs(train=True)

Build inputs for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_optimizer()

Build an optimizer for model training.

_build_post_training_graph()

Build post-training graph (do NOT override).

_build_post_training_ops()

Build post-training operators (may be overriden).

Returns:A list of Tensorflow operators.
Return type:list
_build_serving_graph()

Call sub-functions to build serving graph (do NOT override).

_build_training_graph()

Call sub-functions to build training graph (do NOT override).

_build_user_extractions(train=True)

Build extraction modules for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_user_inputs(train=True)

Build inputs for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_get_input(name, train=True)

Retrieve an input.

Parameters:
  • name (str) – Input name.
  • train (bool) – Specify training or serving graph.
Returns:

The input specified by the name and the train flag.

Return type:

Tensorflow placeholder

_get_module(name, train=True)

Retrieve a module.

Parameters:
  • name (str) – The module name.
  • train (bool) – Specify training or serving graph.
Returns:

The module specified by the name and the train flag.

Return type:

Module

_grad_post_processing(grad_var_list)

Post-process gradients before updating variables.

Parameters:grad_var_list (list) – A list of tuples (gradients, variable).
Returns:A list of updated tuples (updated gradients, variables).
Return type:list
_initialize(init_dict)

Initialize model parameters (do NOT override).

Parameters:init_dict (dict) – Key-value pairs for initial parameter values.
_input(dtype='float32', shape=None, name=None)

Define an input for the recommender.

Parameters:
  • dtype (str) – Data type: “float16”, “float32”, “float64”, “int8”, “int16”, “int32”, “int64”, “bool”, or “string”.
  • shape (list or tuple) – Input shape.
  • name (str) – Name of the input.
Returns:

Defined tensorflow placeholder.

Return type:

Tensorflow placeholder

_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

compute_module_loss(name, batch_data, train=True)

Compute the loss of a module, specified by the name and the train flag.

Parameters:
  • name (str) – The module name.
  • batch_data (dict) – A batch of training or serving data.
  • train (bool) – Specify the computational graph (train/serving) to compute loss.
Returns:

The loss of the specified module.

Return type:

Numpy array

compute_module_outputs(name, batch_data, train=True)

Compute the outputs of a module, specified by the name and the train flag.

Parameters:
  • name (str) – The module name.
  • batch_data (dict) – A batch of training or serving data.
  • train (bool) – Specify the computational graph (train/serving) to compute outputs.
Returns:

The outputs of the specified module.

Return type:

A list of Numpy arrays

load(load_dir)

Load a saved model from disk.

Parameters:load_str (str) – Path to the saved model.
save(save_dir, step)

Save a trained model to disk.

Parameters:
  • save_str (str) – Path to save the model.
  • step (int) – training step.
serve(batch_data)

Evaluate the model with an input batch_data.

Parameters:batch_data (dict) – A batch of testing or serving data.
train(batch_data)

Train the model with an input batch_data.

Parameters:batch_data (dict) – A batch of training data.

BPR

class openrec.legacy.recommenders.BPR(batch_size, max_user, max_item, dim_embed, test_batch_size=None, l2_reg=None, opt='SGD', lr=None, init_dict=None, sess_config=None)

Pure Baysian Personalized Ranking (BPR) [1] based Recommender

Parameters:
  • batch_size (int) – Training batch size. Each training instance consists of an user, a positive item, and a negative item.
  • max_user (int) – Maximum number of users in the recommendation system.
  • max_item (int) – Maximum number of items in the recommendation system.
  • dim_embed (int) – Dimensionality of the user/item embedding.
  • test_batch_size (int, optional) – Batch size for testing and serving. Each testing/serving bacth consists of an user.
  • l2_reg (float, optional) – Weight for L2 regularization, i.e., weight decay.
  • opt ('SGD'(default) or 'Adam', optional) – Optimization algorithm, SGD: Stochastic Gradient Descent.
  • lr (float, optional) – Initial learning rate.
  • init_dict (dict, optional) – Key-value pairs for inital parameter values.
  • sess_config (tensorflow.ConfigProto(), optional) – Tensorflow session configuration.

Notes

BPR recommender is trained on users’ implicit feedback signals (e.g., clicks and views). The items clicked or viewed are treated as positive items, and otherwise as negative items. The pure BPR recommender does not consider any other auxiliary signals.

References

[1]Rendle, S., Freudenthaler, C., Gantner, Z. and Schmidt-Thieme, L., 2009, June. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 452-461). AUAI Press.
_build_default_interactions(train=True)

Build default interaction modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_extractions(train=True)

Build extraction modules for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_inputs(train=True)

Build inputs for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_serving_graph()

Call sub-functions to build serving graph (do NOT override).

_build_user_extractions(train=True)

Build extraction modules for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_user_inputs(train=True)

Build inputs for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

VisualBPR

class openrec.legacy.recommenders.VisualBPR(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='Adam', sess_config=None)
_build_default_fusions(train=True)

Build default fusion modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_extractions(train=True)

Build extraction modules for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_inputs(train=True)

Build inputs for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_grad_post_processing(grad_var_list)

Post-process gradients before updating variables.

Parameters:grad_var_list (list) – A list of tuples (gradients, variable).
Returns:A list of updated tuples (updated gradients, variables).
Return type:list
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

CDL

class openrec.legacy.recommenders.CDL(batch_size, max_user, max_item, dim_embed, item_f, dims, dropout=None, test_batch_size=None, item_serving_size=None, l2_reg=None, l2_reg_mlp=None, l2_reconst=None, opt='SGD', sess_config=None)
_build_default_fusions(train=True)

Build default fusion modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_extractions(train=True)

Build extraction modules for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_inputs(train=True)

Build inputs for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

CML

class openrec.legacy.recommenders.CML(batch_size, max_user, max_item, dim_embed, test_batch_size=None, l2_reg=None, opt='SGD', lr=None, init_dict=None, sess_config=None)
_build_interactions(train=True)

Call sub-functions to build interactions (do NOT override).

Parameters:train (bool) – An indicator for training or servining phase.
_build_post_training_ops()

Build post-training operators (may be overriden).

Returns:A list of Tensorflow operators.
Return type:list

Visual CML

class openrec.legacy.recommenders.VisualCML(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='Adam', sess_config=None)
_build_default_interactions(train=True)

Build default interaction modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.

ConcatVisualBPR

class openrec.legacy.recommenders.ConcatVisualBPR(batch_size, max_user, max_item, dim_embed, dim_ve, item_f_source, item_serving_size=None, l2_reg=None, sess_config=None)
_build_default_fusions(train=True)

Build default fusion modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_extractions(train=True)

Build extraction modules for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_inputs(train=True)

Build inputs for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_grad_post_processing(grad_var_list)

Post-process gradients before updating variables.

Parameters:grad_var_list (list) – A list of tuples (gradients, variable).
Returns:A list of updated tuples (updated gradients, variables).
Return type:list
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

PMF

class openrec.legacy.recommenders.PMF(batch_size, dim_embed, max_user, max_item, test_batch_size=None, l2_reg=None, opt='SGD', sess_config=None)
_build_default_interactions(train=True)

Build default interaction modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_extra_inputs(train=True)

Build inputs for contextual data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_extractions(train=True)

Build extraction modules for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_inputs(train=True)

Build inputs for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_serving_graph()

Call sub-functions to build serving graph (do NOT override).

_build_user_extractions(train=True)

Build extraction modules for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_user_inputs(train=True)

Build inputs for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

User PMF

class openrec.legacy.recommenders.UserPMF(batch_size, max_user, max_item, dim_embed, dims, user_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)
_build_default_fusions(train=True)

Build default fusion modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_user_extractions(train=True)

Build extraction modules for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_user_inputs(train=True)

Build inputs for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

VisualPMF

class openrec.legacy.recommenders.VisualPMF(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)
_build_default_fusions(train=True)

Build default fusion modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_extractions(train=True)

Build extraction modules for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_item_inputs(train=True)

Build inputs for items’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

User Visual PMF

class openrec.legacy.recommenders.UserVisualPMF(batch_size, max_user, max_item, dim_embed, dims_user, dims_item, user_f_source, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)
_build_default_fusions(train=True)

Build default fusion modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.
_build_user_extractions(train=True)

Build extraction modules for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_build_user_inputs(train=True)

Build inputs for users’ data sources (should be overriden)

Parameters:train (bool) – An indicator for training or servining phase.
_input_mappings(batch_data, train)

Define mappings from input training batch to defined inputs.

Parameters:
  • batch_data (dict) – A training batch.
  • train (bool) – An indicator for training or servining phase.
Returns:

The mapping where a key corresponds to an input object, and a value corresponds to a batch_data value.

Return type:

dict

VisualGMF

class openrec.legacy.recommenders.VisualGMF(batch_size, max_user, max_item, dim_embed, dims, item_f_source, test_batch_size=None, item_serving_size=None, dropout_rate=None, l2_reg=None, l2_reg_mlp=None, opt='SGD', sess_config=None)
_build_default_interactions(train=True)

Build default interaction modules (may be overriden).

Parameters:train (bool) – An indicator for training or servining phase.

utils package

Evaluators package

Evaluator

class openrec.legacy.utils.evaluators.Evaluator(etype, name)
compute()

Implicit Eval Manager

class openrec.legacy.utils.evaluators.ImplicitEvalManager(evaluators=[])
_full_rank(pos_samples, excl_pos_samples, predictions)
_partial_rank(pos_scores, neg_scores)
full_eval(pos_samples, excl_pos_samples, predictions)
partial_eval(pos_scores, neg_scores)

AUC

class openrec.legacy.utils.evaluators.AUC(name='AUC')
compute(rank_above, negative_num)

Recall

class openrec.legacy.utils.evaluators.Recall(recall_at, name='Recall')
compute(rank_above, negative_num)

MSE

class openrec.legacy.utils.evaluators.MSE(name='MSE')
compute(predictions, labels)

NDCG

class openrec.legacy.utils.evaluators.NDCG(ndcg_at, name='NDCG')
compute(rank_above, negative_num)

Precision

class openrec.legacy.utils.evaluators.Precision(precision_at, name='Precision')
compute(rank_above, negative_num)

Samplers

Sampler

class openrec.legacy.utils.samplers.Sampler(dataset, batch_size, num_process=5)
_get_runner()
next_batch()

Explicit Sampler

class openrec.legacy.utils.samplers.ExplicitSampler(dataset, batch_size, num_process=5, chronological=False)
_get_runner()

Pairwise Sampler

class openrec.legacy.utils.samplers.PairwiseSampler(dataset, batch_size, chronological=False, num_process=5, seed=0)
_get_runner()

Pointwise Sampler

class openrec.legacy.utils.samplers.PointwiseSampler(dataset, batch_size, pos_ratio=0.5, num_process=5, chronological=False, seed=0)
_get_runner()

openrec.legacy.utils.dataset module

class openrec.legacy.utils.dataset.Dataset(raw_data, max_user, max_item, name='dataset')

Bases: object

The Dataset class stores a sequence of data points for training or evaluation.

Parameters:
  • raw_data (numpy structured array) – Input raw data.
  • max_user (int) – Maximum number of users in the recommendation system.
  • max_item (int) – Maximum number of items in the recommendation system.
  • name (str) – Name of the dataset.

Notes

The Dataset class expects raw_data as a numpy structured array, where each row represents a data point and contains at least two keys:

  • user_id: the user involved in the interaction.
  • item_id: the item involved in the interaction.

raw_data might contain other keys, such as timestamp, and location, etc. based on the use cases of different recommendation systems. An user should be uniquely and numerically indexed from 0 to total_number_of_users - 1. The items should be indexed likewise.

max_item()

Maximum number of items.

Returns:Maximum number of items.
Return type:int
max_user()

Maximum number of users.

Returns:Maximum number of users.
Return type:int
shuffle()

Shuffle the dataset entries.

openrec.legacy.utils.implicit_dataset module

class openrec.legacy.utils.implicit_dataset.ImplicitDataset(raw_data, max_user, max_item, name='dataset')

Bases: openrec.legacy.utils.dataset.Dataset

The ImplicitDataset class stores and parses a sequence of user implicit feedback for training or evaluation. It extends the functionality of the Dataset class.

Parameters:
  • raw_data (numpy structured array) – Input raw data. Other legacy formats (e.g., sparse matrix) are supported but not recommended.
  • max_user (int) – Maximum number of users in the recommendation system.
  • max_item (int) – Maximum number of items in the recommendation system.
  • name (str) – Name of the dataset.

Notes

The ImplicitDataset class parses the input raw_data into structured dictionaries (consumed by samplers or model trainer). This class expects raw_data as a numpy structured array, where each row represents a data point and contains at least two keys:

  • user_id: the user involved in the interaction.
  • item_id: the item involved in the interaction.

raw_data might contain other keys, such as timestamp, and location, etc. based on the use cases of different recommendation systems. An user should be uniquely and numerically indexed from 0 to total_number_of_users - 1. The items should be indexed likewise.

contain_item(item_id)

Check whether or not an item is involved in any interaction.

Parameters:item_id (int) – target item id.
Returns:A boolean indicator
Return type:bool
contain_user(user_id)

Check whether or not an user is involved in any interaction.

Parameters:user_id (int) – target user id.
Returns:A boolean indicator
Return type:bool
get_interactions_by_item_gb_user(item_id)

Retrieve the interactions (grouped by user ids) involve a specific item.

Parameters:item_id (int) – target item id.
Returns:Users that have interacted with given item.
Return type:list
get_interactions_by_user_gb_item(user_id)

Retrieve the interactions (grouped by item ids) involve a specific user.

Parameters:user_id (int) – target user id.
Returns:Items that have interacted with given user.
Return type:list
get_unique_item_list()

Retrieve a list of unique item ids.

Returns:A list of unique item ids.
Return type:numpy array
get_unique_user_list()

Retrieve a list of unique user ids.

Returns:A list of unique user ids.
Return type:numpy array
unique_item_count()

Number of unique items.

Returns:Number of unique items.
Return type:int
unique_user_count()

Number of unique users.

Returns:Number of unique users.
Return type:int

implicit_model_trainer module

class openrec.legacy.implicit_model_trainer.ImplicitModelTrainer(batch_size, test_batch_size, train_dataset, model, sampler, item_serving_size=None, eval_save_prefix=None)

Bases: object

The ImplicitModelTrainer class implements logics for basic recommender training and evaluation using users’ implicit feedback.

Parameters:
  • batch_size (int) – Training batch size.
  • test_batch_size (int) – Test/Evaluation batch size (number of users per testing batch).
  • train_dataset (Dataset) – Dataset for model training.
  • model (Recommender) – The target recommender.
  • sampler (Sampler) – The sampler for model training.
  • item_serving_size (int, optional) – Test/Evaluation batch size (number of items per testing batch).

Notes

The function train should be called for model training and evaluation.

train(num_itr, display_itr, eval_datasets=[], evaluators=[], num_negatives=None, seed=10)

Train and evaluate a recommender.

Parameters:
  • num_itr (int) – total number of training iterations.
  • display_itr (int) – Evaluation/testing period.
  • eval_datasets (list of Dataset) – A list of datasets for evaluation/testing.
  • evaluators (list of Evaluator) – A list of evaluators for evaluation/testing.
  • num_negatives (int, optional) – If specified, a given number of items NOT interacted with each user will be sampled (as negative items) for evaluations.

Indices and tables