Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders Benchmark Train Models

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Benchmarking, Model Training
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for training recommendation models with standardized timing instrumentation in the benchmarking workflow.

Description

The train_* family of functions in benchmark_utils.py provides a uniform training interface for all benchmarked algorithms. Each function accepts a parameter dictionary and an algorithm-specific data object, instantiates the model, trains it inside a Timer context manager, and returns a (model, Timer) tuple. The Timer captures wall-clock elapsed time for the training phase only.

Each function handles the idiosyncrasies of its algorithm:

  • train_sar: Instantiates SAR, calls set_index(data) before fitting, then fits with Timer.
  • train_als: Instantiates PySpark ALS with params, calls .fit(data) to produce an ALSModel.
  • train_svd: Instantiates Surprise SVD, calls .fit(data) on the Trainset.
  • train_ncf: Instantiates NCF with dataset dimensions (n_users, n_items) plus params, then fits.
  • train_bpr: Instantiates the custom BPR wrapper, calls .fit(data) on the Cornac Dataset.
  • train_bivae: Instantiates Cornac BiVAECF directly, calls .fit(data).
  • train_embdotbias: Constructs EmbeddingDotBias from data classes, creates a Trainer, and calls trainer.fit() with train/valid splits and epoch count.
  • train_lightgcn: Calls prepare_hparams(**params) to build hyperparameters, instantiates LightGCN with the data graph, then calls .fit().

Usage

Use these functions in a multi-algorithm benchmark loop. They are registered in a dispatch dictionary keyed by algorithm name, enabling generic training calls.

Code Reference

Source Location

  • Repository: recommenders
  • File: examples/06_benchmarks/benchmark_utils.py (Lines 89-392)

Signature

def train_sar(params, data) -> tuple[SAR, Timer]

def train_als(params, data) -> tuple[ALSModel, Timer]

def train_svd(params, data) -> tuple[SVD, Timer]

def train_ncf(params, data) -> tuple[NCF, Timer]

def train_bpr(params, data) -> tuple[BPR, Timer]

def train_bivae(params, data) -> tuple[BiVAECF, Timer]

def train_embdotbias(params, data) -> tuple[EmbeddingDotBias, Timer]

def train_lightgcn(params, data) -> tuple[LightGCN, Timer]

Import

import sys
sys.path.append("examples/06_benchmarks")
from benchmark_utils import (
    train_sar,
    train_als,
    train_svd,
    train_ncf,
    train_bpr,
    train_bivae,
    train_embdotbias,
    train_lightgcn,
)

Dependencies

  • recommenders.models.sar.SAR
  • pyspark.ml.recommendation.ALS
  • surprise.SVD
  • recommenders.models.ncf.ncf_singlenode.NCF
  • recommenders.models.cornac.bpr.BPR
  • cornac.models.BiVAECF
  • recommenders.models.embdotbias.model.EmbeddingDotBias
  • recommenders.models.embdotbias.training_utils.Trainer
  • recommenders.models.deeprec.models.graphrec.lightgcn.LightGCN
  • recommenders.models.deeprec.deeprec_utils.prepare_hparams
  • recommenders.utils.timer.Timer

I/O Contract

Function Input: params Input: data Output: model Output: Timer
train_sar dict (similarity_type, time_decay_coefficient, col_* keys) pd.DataFrame SAR (fitted) Wall-clock training time
train_als dict (rank, maxIter, regParam, etc.) pyspark.sql.DataFrame ALSModel (fitted) Wall-clock training time
train_svd dict (n_factors, n_epochs, lr_all, reg_all, etc.) surprise.Trainset surprise.SVD (fitted) Wall-clock training time
train_ncf dict (model_type, n_factors, layer_sizes, etc.) NCFDataset NCF (fitted) Wall-clock training time
train_bpr dict (k, max_iter, learning_rate, lambda_reg, etc.) cornac.data.Dataset BPR (fitted) Wall-clock training time
train_bivae dict (k, encoder_structure, act_fn, likelihood, etc.) cornac.data.Dataset BiVAECF (fitted) Wall-clock training time
train_embdotbias dict (n_factors, y_range, wd, lr_max, epochs) RecoDataLoader EmbeddingDotBias (fitted) Wall-clock training time
train_lightgcn dict (model_type, n_layers, embed_size, etc.) ImplicitCF LightGCN (fitted) Wall-clock training time

Usage Examples

from benchmark_utils import train_sar, train_als, train_svd

# Build a trainer dispatch dictionary
trainer = {
    "als": lambda params, data: train_als(params, data),
    "svd": lambda params, data: train_svd(params, data),
    "sar": lambda params, data: train_sar(params, data),
    "ncf": lambda params, data: train_ncf(params, data),
    "bpr": lambda params, data: train_bpr(params, data),
    "bivae": lambda params, data: train_bivae(params, data),
    "embdotbias": lambda params, data: train_embdotbias(params, data),
    "lightgcn": lambda params, data: train_lightgcn(params, data),
}

# In the benchmark loop:
for algo in algorithms:
    train_data = prepare_training_data[algo](df_train, df_test)
    model, time_train = trainer[algo](params[algo], train_data)
    print(f"{algo} training time: {time_train}s")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment