Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders Benchmark Metric Functions

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Benchmarking, Evaluation
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for computing standardized rating and ranking evaluation metrics across Python and PySpark backends in the benchmarking workflow.

Description

The four metric functions in benchmark_utils.py wrap the underlying evaluation modules from the recommenders library into a simple dictionary-returning interface. Each function accepts the test data and predictions, computes a fixed set of metrics, and returns a dictionary with standardized string keys.

  • rating_metrics_python: Calls rmse(), mae(), rsquared(), and exp_var() from recommenders.evaluation.python_evaluation. Operates on pandas DataFrames.
  • ranking_metrics_python: Calls map(), ndcg_at_k(), precision_at_k(), and recall_at_k() from recommenders.evaluation.python_evaluation. Operates on pandas DataFrames.
  • rating_metrics_pyspark: Instantiates SparkRatingEvaluation and calls its .rmse(), .mae(), .exp_var(), and .rsquared() methods. Operates on PySpark DataFrames.
  • ranking_metrics_pyspark: Instantiates SparkRankingEvaluation with relevancy_method="top_k" and calls its .map(), .ndcg_at_k(), .precision_at_k(), and .recall_at_k() methods. Operates on PySpark DataFrames.

All four functions pass **COL_DICT to their underlying calls, ensuring consistent column name mappings (userID, itemID, rating, prediction).

Usage

Use these functions after generating predictions or recommendations in the benchmark loop. The algorithm's execution environment determines which backend to use: PySpark algorithms (ALS) use the pyspark variants; all others use the python variants.

Code Reference

Source Location

  • Repository: recommenders
  • File: examples/06_benchmarks/benchmark_utils.py (Lines 403-440)

Signature

def rating_metrics_python(test, predictions) -> dict
# Returns: {"RMSE": float, "MAE": float, "R2": float, "Explained Variance": float}

def ranking_metrics_python(test, predictions, k=DEFAULT_K) -> dict
# Returns: {"MAP": float, "nDCG@k": float, "Precision@k": float, "Recall@k": float}

def rating_metrics_pyspark(test, predictions) -> dict
# Returns: {"RMSE": float, "MAE": float, "R2": float, "Explained Variance": float}

def ranking_metrics_pyspark(test, predictions, k=DEFAULT_K) -> dict
# Returns: {"MAP": float, "nDCG@k": float, "Precision@k": float, "Recall@k": float}

Import

import sys
sys.path.append("examples/06_benchmarks")
from benchmark_utils import (
    rating_metrics_python,
    ranking_metrics_python,
    rating_metrics_pyspark,
    ranking_metrics_pyspark,
)

Dependencies

  • recommenders.evaluation.python_evaluation: rmse, mae, rsquared, exp_var, map, ndcg_at_k, precision_at_k, recall_at_k
  • recommenders.evaluation.spark_evaluation: SparkRatingEvaluation, SparkRankingEvaluation
  • recommenders.utils.constants.COL_DICT: Standard column name mapping
  • recommenders.utils.constants.DEFAULT_K: Default top-K value (10)

I/O Contract

Rating Metric Functions

Function Input: test Input: predictions Output Key Output Type
rating_metrics_python pd.DataFrame pd.DataFrame "RMSE", "MAE", "R2", "Explained Variance" dict[str, float]
rating_metrics_pyspark pyspark.sql.DataFrame pyspark.sql.DataFrame "RMSE", "MAE", "R2", "Explained Variance" dict[str, float]

Ranking Metric Functions

Function Input: test Input: predictions Input: k Output Key Output Type
ranking_metrics_python pd.DataFrame pd.DataFrame int (default 10) "MAP", "nDCG@k", "Precision@k", "Recall@k" dict[str, float]
ranking_metrics_pyspark pyspark.sql.DataFrame pyspark.sql.DataFrame int (default 10) "MAP", "nDCG@k", "Precision@k", "Recall@k" dict[str, float]

Algorithm-to-Evaluator Mapping

Algorithm Rating Evaluator Ranking Evaluator
ALS rating_metrics_pyspark ranking_metrics_pyspark
SVD rating_metrics_python ranking_metrics_python
EmbeddingDotBias rating_metrics_python ranking_metrics_python
SAR (none) ranking_metrics_python
NCF (none) ranking_metrics_python
BPR (none) ranking_metrics_python
BiVAE (none) ranking_metrics_python
LightGCN (none) ranking_metrics_python

Usage Examples

from benchmark_utils import (
    rating_metrics_python,
    ranking_metrics_python,
    rating_metrics_pyspark,
    ranking_metrics_pyspark,
)

# Build evaluator dispatch dictionaries
rating_evaluator = {
    "als": lambda test, predictions: rating_metrics_pyspark(test, predictions),
    "svd": lambda test, predictions: rating_metrics_python(test, predictions),
    "embdotbias": lambda test, predictions: rating_metrics_python(test, predictions),
}

ranking_evaluator = {
    "als": lambda test, predictions, k: ranking_metrics_pyspark(test, predictions, k),
    "sar": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "svd": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "ncf": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bpr": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bivae": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "embdotbias": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "lightgcn": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
}

# In the benchmark loop:
if "rating" in metrics[algo]:
    ratings = rating_evaluator[algo](test, preds)
    print(f"RMSE: {ratings['RMSE']:.4f}, MAE: {ratings['MAE']:.4f}")

if "ranking" in metrics[algo]:
    rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)
    print(f"MAP: {rankings['MAP']:.4f}, nDCG@k: {rankings['nDCG@k']:.4f}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment