Implementation:Recommenders team Recommenders Benchmark Train Models
| Knowledge Sources | |
|---|---|
| Domains | Recommender Systems, Benchmarking, Model Training |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for training recommendation models with standardized timing instrumentation in the benchmarking workflow.
Description
The train_* family of functions in benchmark_utils.py provides a uniform training interface for all benchmarked algorithms. Each function accepts a parameter dictionary and an algorithm-specific data object, instantiates the model, trains it inside a Timer context manager, and returns a (model, Timer) tuple. The Timer captures wall-clock elapsed time for the training phase only.
Each function handles the idiosyncrasies of its algorithm:
- train_sar: Instantiates SAR, calls
set_index(data)before fitting, then fits with Timer. - train_als: Instantiates PySpark ALS with params, calls
.fit(data)to produce an ALSModel. - train_svd: Instantiates Surprise SVD, calls
.fit(data)on the Trainset. - train_ncf: Instantiates NCF with dataset dimensions (
n_users,n_items) plus params, then fits. - train_bpr: Instantiates the custom BPR wrapper, calls
.fit(data)on the Cornac Dataset. - train_bivae: Instantiates Cornac BiVAECF directly, calls
.fit(data). - train_embdotbias: Constructs EmbeddingDotBias from data classes, creates a Trainer, and calls
trainer.fit()with train/valid splits and epoch count. - train_lightgcn: Calls
prepare_hparams(**params)to build hyperparameters, instantiates LightGCN with the data graph, then calls.fit().
Usage
Use these functions in a multi-algorithm benchmark loop. They are registered in a dispatch dictionary keyed by algorithm name, enabling generic training calls.
Code Reference
Source Location
- Repository: recommenders
- File:
examples/06_benchmarks/benchmark_utils.py(Lines 89-392)
Signature
def train_sar(params, data) -> tuple[SAR, Timer]
def train_als(params, data) -> tuple[ALSModel, Timer]
def train_svd(params, data) -> tuple[SVD, Timer]
def train_ncf(params, data) -> tuple[NCF, Timer]
def train_bpr(params, data) -> tuple[BPR, Timer]
def train_bivae(params, data) -> tuple[BiVAECF, Timer]
def train_embdotbias(params, data) -> tuple[EmbeddingDotBias, Timer]
def train_lightgcn(params, data) -> tuple[LightGCN, Timer]
Import
import sys
sys.path.append("examples/06_benchmarks")
from benchmark_utils import (
train_sar,
train_als,
train_svd,
train_ncf,
train_bpr,
train_bivae,
train_embdotbias,
train_lightgcn,
)
Dependencies
recommenders.models.sar.SARpyspark.ml.recommendation.ALSsurprise.SVDrecommenders.models.ncf.ncf_singlenode.NCFrecommenders.models.cornac.bpr.BPRcornac.models.BiVAECFrecommenders.models.embdotbias.model.EmbeddingDotBiasrecommenders.models.embdotbias.training_utils.Trainerrecommenders.models.deeprec.models.graphrec.lightgcn.LightGCNrecommenders.models.deeprec.deeprec_utils.prepare_hparamsrecommenders.utils.timer.Timer
I/O Contract
| Function | Input: params | Input: data | Output: model | Output: Timer |
|---|---|---|---|---|
train_sar |
dict (similarity_type, time_decay_coefficient, col_* keys) | pd.DataFrame | SAR (fitted) | Wall-clock training time |
train_als |
dict (rank, maxIter, regParam, etc.) | pyspark.sql.DataFrame | ALSModel (fitted) | Wall-clock training time |
train_svd |
dict (n_factors, n_epochs, lr_all, reg_all, etc.) | surprise.Trainset | surprise.SVD (fitted) | Wall-clock training time |
train_ncf |
dict (model_type, n_factors, layer_sizes, etc.) | NCFDataset | NCF (fitted) | Wall-clock training time |
train_bpr |
dict (k, max_iter, learning_rate, lambda_reg, etc.) | cornac.data.Dataset | BPR (fitted) | Wall-clock training time |
train_bivae |
dict (k, encoder_structure, act_fn, likelihood, etc.) | cornac.data.Dataset | BiVAECF (fitted) | Wall-clock training time |
train_embdotbias |
dict (n_factors, y_range, wd, lr_max, epochs) | RecoDataLoader | EmbeddingDotBias (fitted) | Wall-clock training time |
train_lightgcn |
dict (model_type, n_layers, embed_size, etc.) | ImplicitCF | LightGCN (fitted) | Wall-clock training time |
Usage Examples
from benchmark_utils import train_sar, train_als, train_svd
# Build a trainer dispatch dictionary
trainer = {
"als": lambda params, data: train_als(params, data),
"svd": lambda params, data: train_svd(params, data),
"sar": lambda params, data: train_sar(params, data),
"ncf": lambda params, data: train_ncf(params, data),
"bpr": lambda params, data: train_bpr(params, data),
"bivae": lambda params, data: train_bivae(params, data),
"embdotbias": lambda params, data: train_embdotbias(params, data),
"lightgcn": lambda params, data: train_lightgcn(params, data),
}
# In the benchmark loop:
for algo in algorithms:
train_data = prepare_training_data[algo](df_train, df_test)
model, time_train = trainer[algo](params[algo], train_data)
print(f"{algo} training time: {time_train}s")