Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Recommenders team Recommenders Benchmark Prediction Generation

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Benchmarking, Prediction
Last Updated 2026-02-10 00:00 GMT

Overview

Standardized prediction and top-K recommendation generation across different algorithms for benchmarking, accounting for the fact that some algorithms produce rating predictions, others produce ranked lists, and some produce both.

Description

Recommendation algorithms differ fundamentally in their output capabilities. Some algorithms (ALS, SVD, EmbeddingDotBias) can produce rating predictions -- estimating the score a user would give to a specific item. Other algorithms (SAR, NCF, BPR, BiVAE, LightGCN) produce ranked recommendation lists -- ordered sets of items predicted to be most relevant. Some algorithms support both.

The Benchmark Prediction Generation principle separates these two output modes into distinct function families:

  • predict_* functions: Generate rating predictions for known user-item pairs (from the test set). Used for computing rating metrics (RMSE, MAE, R2, Explained Variance).
  • recommend_k_* functions: Generate top-K ranked recommendation lists for all users. Used for computing ranking metrics (MAP, nDCG@k, Precision@k, Recall@k).

Both function families:

  1. Accept the trained model and relevant data (test set, training set for seen-item removal).
  2. Wrap the prediction/recommendation call in a Timer context manager.
  3. Return a (results, Timer) tuple for consistent metric collection.

Not every algorithm has both a predict_* and recommend_k_* function. The benchmark tracks which algorithms support which metric types through a configuration dictionary:

  • Rating-capable: ALS, SVD, EmbeddingDotBias
  • Ranking-capable: All eight algorithms

Usage

Use this principle when benchmarking algorithms that produce different types of outputs. The separation into predict_* and recommend_k_* allows the benchmark loop to conditionally call the appropriate function based on each algorithm's capabilities.

Theoretical Basis

The two prediction modes correspond to different evaluation paradigms:

Rating Prediction:
  predict_a(model, test) -> (predictions, timer)

  For each (user, item) pair in test set:
    prediction = model.predict(user, item)
  Output: DataFrame with columns [userID, itemID, prediction]
  Evaluation: RMSE, MAE, R2, Explained Variance

Top-K Recommendation:
  recommend_k_a(model, test, train, top_k, remove_seen) -> (recs, timer)

  For each user:
    candidates = all_items - seen_items (if remove_seen=True)
    scores = model.score(user, candidates)
    recs = top_k(scores)
  Output: DataFrame with columns [userID, itemID, prediction]
  Evaluation: MAP, nDCG@k, Precision@k, Recall@k

The remove_seen parameter (defaulting to True) ensures that items already in the training set are excluded from recommendations, preventing trivial recommendations of already-known items. The implementation of seen-item removal varies by algorithm (e.g., SQL outer join for ALS, pandas merge for NCF, built-in parameter for SAR).

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment