Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders Python Evaluation Metrics

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Evaluation Metrics, Information Retrieval
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tools for computing rating and ranking evaluation metrics for recommender systems provided by the recommenders library.

Description

The recommenders.evaluation.python_evaluation module provides a suite of evaluation functions for measuring recommender system performance. The module includes:

  • Rating metrics: rmse and mae for measuring prediction accuracy against ground truth ratings.
  • Ranking metrics: precision_at_k, recall_at_k, ndcg_at_k, and map for measuring the quality of top-K recommendation lists.

All functions accept two DataFrames (ground truth and predictions) and return a single float score. They share a common interface for column name configuration and handle the merging of true and predicted data internally.

Usage

Import these functions at the evaluation stage of a recommender system pipeline, after generating predictions or recommendation lists. Rating metrics are used with rating prediction outputs; ranking metrics are used with top-K recommendation outputs.

Code Reference

Source Location

  • Repository: recommenders
  • File: recommenders/evaluation/python_evaluation.py
  • Lines:
    • rmse: L165-L195
    • mae: L198-L228
    • precision_at_k: L448-L496
    • recall_at_k: L499-L541
    • ndcg_at_k: L601-L696
    • map: L734-L785

Signature

def rmse(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_rating=DEFAULT_RATING_COL, col_prediction=DEFAULT_PREDICTION_COL,
) -> float

def mae(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_rating=DEFAULT_RATING_COL, col_prediction=DEFAULT_PREDICTION_COL,
) -> float

def precision_at_k(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
) -> float

def recall_at_k(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
) -> float

def ndcg_at_k(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_rating=DEFAULT_RATING_COL, col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
    score_type="binary", discfun_type="loge",
) -> float

def map(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
) -> float

Import

from recommenders.evaluation.python_evaluation import (
    rmse,
    mae,
    precision_at_k,
    recall_at_k,
    ndcg_at_k,
    map,
)

I/O Contract

Inputs (Rating Metrics: rmse, mae)

Name Type Required Description
rating_true pd.DataFrame Yes Ground truth DataFrame with user-item-rating columns. Must have no duplicate (user, item) pairs.
rating_pred pd.DataFrame Yes Predicted DataFrame with user-item-prediction columns. Must have no duplicate (user, item) pairs.
col_user str No (default: DEFAULT_USER_COL) Column name for user IDs.
col_item str No (default: DEFAULT_ITEM_COL) Column name for item IDs.
col_rating str No (default: DEFAULT_RATING_COL) Column name for true rating values.
col_prediction str No (default: DEFAULT_PREDICTION_COL) Column name for predicted rating values.

Inputs (Ranking Metrics: precision_at_k, recall_at_k, ndcg_at_k, map)

Name Type Required Description
rating_true pd.DataFrame Yes Ground truth DataFrame with user-item columns representing relevant items.
rating_pred pd.DataFrame Yes Predicted DataFrame with user-item-prediction columns representing the recommendation list.
col_user str No (default: DEFAULT_USER_COL) Column name for user IDs.
col_item str No (default: DEFAULT_ITEM_COL) Column name for item IDs.
col_prediction str No (default: DEFAULT_PREDICTION_COL) Column name for predicted scores.
relevancy_method str No (default: "top_k") Method for determining relevancy: "top_k", "by_threshold", or None.
k int No (default: DEFAULT_K) Number of top items per user for evaluation.
threshold float No (default: DEFAULT_THRESHOLD) Threshold for relevancy when using "by_threshold" method.

Additional Inputs (ndcg_at_k only)

Name Type Required Description
col_rating str No (default: DEFAULT_RATING_COL) Column name for true rating values, used for graded relevance.
score_type str No (default: "binary") Type of relevance scoring: "binary" (hit/miss), "raw" (use rating directly), or "exp" (2^rating - 1).
discfun_type str No (default: "loge") Discount function: "loge" (natural log) or "log2" (base-2 log).

Outputs

Name Type Description
return (rmse) float Root Mean Squared Error. Range: [0, +inf). Lower is better.
return (mae) float Mean Absolute Error. Range: [0, +inf). Lower is better.
return (precision_at_k) float Precision at K. Range: [0, 1]. Higher is better.
return (recall_at_k) float Recall at K. Range: [0, 1]. Higher is better.
return (ndcg_at_k) float Normalized Discounted Cumulative Gain at K. Range: [0, 1]. Higher is better.
return (map) float Mean Average Precision at K. Range: [0, 1]. Higher is better.

Usage Examples

Basic Usage

from recommenders.evaluation.python_evaluation import (
    rmse,
    mae,
    precision_at_k,
    recall_at_k,
    ndcg_at_k,
    map,
)

# Rating metrics (using rating predictions)
eval_rmse = rmse(test_df, pred_df, col_prediction="prediction")
eval_mae = mae(test_df, pred_df, col_prediction="prediction")
print(f"RMSE: {eval_rmse:.4f}")
print(f"MAE:  {eval_mae:.4f}")

# Ranking metrics (using top-K recommendation lists)
eval_precision = precision_at_k(test_df, top_k_df, col_prediction="prediction", k=10)
eval_recall = recall_at_k(test_df, top_k_df, col_prediction="prediction", k=10)
eval_ndcg = ndcg_at_k(test_df, top_k_df, col_prediction="prediction", k=10)
eval_map = map(test_df, top_k_df, col_prediction="prediction", k=10)

print(f"Precision@10: {eval_precision:.4f}")
print(f"Recall@10:    {eval_recall:.4f}")
print(f"NDCG@10:      {eval_ndcg:.4f}")
print(f"MAP@10:       {eval_map:.4f}")

Dependencies

  • numpy - Numerical computation (sqrt, mean operations)
  • pandas - DataFrame merging and groupby operations
  • sklearn.metrics - mean_squared_error, mean_absolute_error base implementations

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment