Principle:Recommenders team Recommenders Benchmark Prediction Generation

Knowledge Sources	Recommenders benchmark_utils.py
Domains	Recommender Systems, Benchmarking, Prediction
Last Updated	2026-02-10 00:00 GMT

Overview

Standardized prediction and top-K recommendation generation across different algorithms for benchmarking, accounting for the fact that some algorithms produce rating predictions, others produce ranked lists, and some produce both.

Description

Recommendation algorithms differ fundamentally in their output capabilities. Some algorithms (ALS, SVD, EmbeddingDotBias) can produce rating predictions -- estimating the score a user would give to a specific item. Other algorithms (SAR, NCF, BPR, BiVAE, LightGCN) produce ranked recommendation lists -- ordered sets of items predicted to be most relevant. Some algorithms support both.

The Benchmark Prediction Generation principle separates these two output modes into distinct function families:

predict_* functions: Generate rating predictions for known user-item pairs (from the test set). Used for computing rating metrics (RMSE, MAE, R2, Explained Variance).
recommend_k_* functions: Generate top-K ranked recommendation lists for all users. Used for computing ranking metrics (MAP, nDCG@k, Precision@k, Recall@k).

Both function families:

Accept the trained model and relevant data (test set, training set for seen-item removal).
Wrap the prediction/recommendation call in a Timer context manager.
Return a (results, Timer) tuple for consistent metric collection.

Not every algorithm has both a predict_* and recommend_k_* function. The benchmark tracks which algorithms support which metric types through a configuration dictionary:

Rating-capable: ALS, SVD, EmbeddingDotBias
Ranking-capable: All eight algorithms

Usage

Use this principle when benchmarking algorithms that produce different types of outputs. The separation into predict_* and recommend_k_* allows the benchmark loop to conditionally call the appropriate function based on each algorithm's capabilities.

Theoretical Basis

The two prediction modes correspond to different evaluation paradigms:

Rating Prediction:
  predict_a(model, test) -> (predictions, timer)

  For each (user, item) pair in test set:
    prediction = model.predict(user, item)
  Output: DataFrame with columns [userID, itemID, prediction]
  Evaluation: RMSE, MAE, R2, Explained Variance

Top-K Recommendation:
  recommend_k_a(model, test, train, top_k, remove_seen) -> (recs, timer)

  For each user:
    candidates = all_items - seen_items (if remove_seen=True)
    scores = model.score(user, candidates)
    recs = top_k(scores)
  Output: DataFrame with columns [userID, itemID, prediction]
  Evaluation: MAP, nDCG@k, Precision@k, Recall@k

The remove_seen parameter (defaulting to True) ensures that items already in the training set are excluded from recommendations, preventing trivial recommendations of already-known items. The implementation of seen-item removal varies by algorithm (e.g., SQL outer join for ALS, pandas merge for NCF, built-in parameter for SAR).

Related Pages

Implemented By

Implementation:Recommenders_team_Recommenders_Benchmark_Predict_And_Recommend

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment