Implementation:Recommenders team Recommenders Benchmark Metric Functions
| Knowledge Sources | |
|---|---|
| Domains | Recommender Systems, Benchmarking, Evaluation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for computing standardized rating and ranking evaluation metrics across Python and PySpark backends in the benchmarking workflow.
Description
The four metric functions in benchmark_utils.py wrap the underlying evaluation modules from the recommenders library into a simple dictionary-returning interface. Each function accepts the test data and predictions, computes a fixed set of metrics, and returns a dictionary with standardized string keys.
- rating_metrics_python: Calls
rmse(),mae(),rsquared(), andexp_var()fromrecommenders.evaluation.python_evaluation. Operates on pandas DataFrames. - ranking_metrics_python: Calls
map(),ndcg_at_k(),precision_at_k(), andrecall_at_k()fromrecommenders.evaluation.python_evaluation. Operates on pandas DataFrames. - rating_metrics_pyspark: Instantiates
SparkRatingEvaluationand calls its.rmse(),.mae(),.exp_var(), and.rsquared()methods. Operates on PySpark DataFrames. - ranking_metrics_pyspark: Instantiates
SparkRankingEvaluationwithrelevancy_method="top_k"and calls its.map(),.ndcg_at_k(),.precision_at_k(), and.recall_at_k()methods. Operates on PySpark DataFrames.
All four functions pass **COL_DICT to their underlying calls, ensuring consistent column name mappings (userID, itemID, rating, prediction).
Usage
Use these functions after generating predictions or recommendations in the benchmark loop. The algorithm's execution environment determines which backend to use: PySpark algorithms (ALS) use the pyspark variants; all others use the python variants.
Code Reference
Source Location
- Repository: recommenders
- File:
examples/06_benchmarks/benchmark_utils.py(Lines 403-440)
Signature
def rating_metrics_python(test, predictions) -> dict
# Returns: {"RMSE": float, "MAE": float, "R2": float, "Explained Variance": float}
def ranking_metrics_python(test, predictions, k=DEFAULT_K) -> dict
# Returns: {"MAP": float, "nDCG@k": float, "Precision@k": float, "Recall@k": float}
def rating_metrics_pyspark(test, predictions) -> dict
# Returns: {"RMSE": float, "MAE": float, "R2": float, "Explained Variance": float}
def ranking_metrics_pyspark(test, predictions, k=DEFAULT_K) -> dict
# Returns: {"MAP": float, "nDCG@k": float, "Precision@k": float, "Recall@k": float}
Import
import sys
sys.path.append("examples/06_benchmarks")
from benchmark_utils import (
rating_metrics_python,
ranking_metrics_python,
rating_metrics_pyspark,
ranking_metrics_pyspark,
)
Dependencies
recommenders.evaluation.python_evaluation: rmse, mae, rsquared, exp_var, map, ndcg_at_k, precision_at_k, recall_at_krecommenders.evaluation.spark_evaluation: SparkRatingEvaluation, SparkRankingEvaluationrecommenders.utils.constants.COL_DICT: Standard column name mappingrecommenders.utils.constants.DEFAULT_K: Default top-K value (10)
I/O Contract
Rating Metric Functions
| Function | Input: test | Input: predictions | Output Key | Output Type |
|---|---|---|---|---|
rating_metrics_python |
pd.DataFrame | pd.DataFrame | "RMSE", "MAE", "R2", "Explained Variance" | dict[str, float] |
rating_metrics_pyspark |
pyspark.sql.DataFrame | pyspark.sql.DataFrame | "RMSE", "MAE", "R2", "Explained Variance" | dict[str, float] |
Ranking Metric Functions
| Function | Input: test | Input: predictions | Input: k | Output Key | Output Type |
|---|---|---|---|---|---|
ranking_metrics_python |
pd.DataFrame | pd.DataFrame | int (default 10) | "MAP", "nDCG@k", "Precision@k", "Recall@k" | dict[str, float] |
ranking_metrics_pyspark |
pyspark.sql.DataFrame | pyspark.sql.DataFrame | int (default 10) | "MAP", "nDCG@k", "Precision@k", "Recall@k" | dict[str, float] |
Algorithm-to-Evaluator Mapping
| Algorithm | Rating Evaluator | Ranking Evaluator |
|---|---|---|
| ALS | rating_metrics_pyspark | ranking_metrics_pyspark |
| SVD | rating_metrics_python | ranking_metrics_python |
| EmbeddingDotBias | rating_metrics_python | ranking_metrics_python |
| SAR | (none) | ranking_metrics_python |
| NCF | (none) | ranking_metrics_python |
| BPR | (none) | ranking_metrics_python |
| BiVAE | (none) | ranking_metrics_python |
| LightGCN | (none) | ranking_metrics_python |
Usage Examples
from benchmark_utils import (
rating_metrics_python,
ranking_metrics_python,
rating_metrics_pyspark,
ranking_metrics_pyspark,
)
# Build evaluator dispatch dictionaries
rating_evaluator = {
"als": lambda test, predictions: rating_metrics_pyspark(test, predictions),
"svd": lambda test, predictions: rating_metrics_python(test, predictions),
"embdotbias": lambda test, predictions: rating_metrics_python(test, predictions),
}
ranking_evaluator = {
"als": lambda test, predictions, k: ranking_metrics_pyspark(test, predictions, k),
"sar": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
"svd": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
"ncf": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
"bpr": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
"bivae": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
"embdotbias": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
"lightgcn": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
}
# In the benchmark loop:
if "rating" in metrics[algo]:
ratings = rating_evaluator[algo](test, preds)
print(f"RMSE: {ratings['RMSE']:.4f}, MAE: {ratings['MAE']:.4f}")
if "ranking" in metrics[algo]:
rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)
print(f"MAP: {rankings['MAP']:.4f}, nDCG@k: {rankings['nDCG@k']:.4f}")