Implementation:Recommenders team Recommenders Python Evaluation Metrics

Knowledge Sources	Recommenders
Domains	Recommender Systems, Evaluation Metrics, Information Retrieval
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tools for computing rating and ranking evaluation metrics for recommender systems provided by the recommenders library.

Description

The recommenders.evaluation.python_evaluation module provides a suite of evaluation functions for measuring recommender system performance. The module includes:

Rating metrics: rmse and mae for measuring prediction accuracy against ground truth ratings.
Ranking metrics: precision_at_k, recall_at_k, ndcg_at_k, and map for measuring the quality of top-K recommendation lists.

All functions accept two DataFrames (ground truth and predictions) and return a single float score. They share a common interface for column name configuration and handle the merging of true and predicted data internally.

Usage

Import these functions at the evaluation stage of a recommender system pipeline, after generating predictions or recommendation lists. Rating metrics are used with rating prediction outputs; ranking metrics are used with top-K recommendation outputs.

Code Reference

Source Location

Repository: recommenders
File: recommenders/evaluation/python_evaluation.py
Lines:
- rmse: L165-L195
- mae: L198-L228
- precision_at_k: L448-L496
- recall_at_k: L499-L541
- ndcg_at_k: L601-L696
- map: L734-L785

Signature

def rmse(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_rating=DEFAULT_RATING_COL, col_prediction=DEFAULT_PREDICTION_COL,
) -> float

def mae(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_rating=DEFAULT_RATING_COL, col_prediction=DEFAULT_PREDICTION_COL,
) -> float

def precision_at_k(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
) -> float

def recall_at_k(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
) -> float

def ndcg_at_k(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_rating=DEFAULT_RATING_COL, col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
    score_type="binary", discfun_type="loge",
) -> float

def map(
    rating_true, rating_pred,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    col_prediction=DEFAULT_PREDICTION_COL,
    relevancy_method="top_k", k=DEFAULT_K, threshold=DEFAULT_THRESHOLD,
) -> float

Import

from recommenders.evaluation.python_evaluation import (
    rmse,
    mae,
    precision_at_k,
    recall_at_k,
    ndcg_at_k,
    map,
)

I/O Contract

Inputs (Rating Metrics: rmse, mae)

Name	Type	Required	Description
rating_true	pd.DataFrame	Yes	Ground truth DataFrame with user-item-rating columns. Must have no duplicate (user, item) pairs.
rating_pred	pd.DataFrame	Yes	Predicted DataFrame with user-item-prediction columns. Must have no duplicate (user, item) pairs.
col_user	str	No (default: DEFAULT_USER_COL)	Column name for user IDs.
col_item	str	No (default: DEFAULT_ITEM_COL)	Column name for item IDs.
col_rating	str	No (default: DEFAULT_RATING_COL)	Column name for true rating values.
col_prediction	str	No (default: DEFAULT_PREDICTION_COL)	Column name for predicted rating values.

Inputs (Ranking Metrics: precision_at_k, recall_at_k, ndcg_at_k, map)

Name	Type	Required	Description
rating_true	pd.DataFrame	Yes	Ground truth DataFrame with user-item columns representing relevant items.
rating_pred	pd.DataFrame	Yes	Predicted DataFrame with user-item-prediction columns representing the recommendation list.
col_user	str	No (default: DEFAULT_USER_COL)	Column name for user IDs.
col_item	str	No (default: DEFAULT_ITEM_COL)	Column name for item IDs.
col_prediction	str	No (default: DEFAULT_PREDICTION_COL)	Column name for predicted scores.
relevancy_method	str	No (default: "top_k")	Method for determining relevancy: "top_k", "by_threshold", or None.
k	int	No (default: DEFAULT_K)	Number of top items per user for evaluation.
threshold	float	No (default: DEFAULT_THRESHOLD)	Threshold for relevancy when using "by_threshold" method.

Additional Inputs (ndcg_at_k only)

Name	Type	Required	Description
col_rating	str	No (default: DEFAULT_RATING_COL)	Column name for true rating values, used for graded relevance.
score_type	str	No (default: "binary")	Type of relevance scoring: "binary" (hit/miss), "raw" (use rating directly), or "exp" (2^rating - 1).
discfun_type	str	No (default: "loge")	Discount function: "loge" (natural log) or "log2" (base-2 log).

Outputs

Name	Type	Description
return (rmse)	float	Root Mean Squared Error. Range: [0, +inf). Lower is better.
return (mae)	float	Mean Absolute Error. Range: [0, +inf). Lower is better.
return (precision_at_k)	float	Precision at K. Range: [0, 1]. Higher is better.
return (recall_at_k)	float	Recall at K. Range: [0, 1]. Higher is better.
return (ndcg_at_k)	float	Normalized Discounted Cumulative Gain at K. Range: [0, 1]. Higher is better.
return (map)	float	Mean Average Precision at K. Range: [0, 1]. Higher is better.

Usage Examples

Basic Usage

from recommenders.evaluation.python_evaluation import (
    rmse,
    mae,
    precision_at_k,
    recall_at_k,
    ndcg_at_k,
    map,
)

# Rating metrics (using rating predictions)
eval_rmse = rmse(test_df, pred_df, col_prediction="prediction")
eval_mae = mae(test_df, pred_df, col_prediction="prediction")
print(f"RMSE: {eval_rmse:.4f}")
print(f"MAE:  {eval_mae:.4f}")

# Ranking metrics (using top-K recommendation lists)
eval_precision = precision_at_k(test_df, top_k_df, col_prediction="prediction", k=10)
eval_recall = recall_at_k(test_df, top_k_df, col_prediction="prediction", k=10)
eval_ndcg = ndcg_at_k(test_df, top_k_df, col_prediction="prediction", k=10)
eval_map = map(test_df, top_k_df, col_prediction="prediction", k=10)

print(f"Precision@10: {eval_precision:.4f}")
print(f"Recall@10:    {eval_recall:.4f}")
print(f"NDCG@10:      {eval_ndcg:.4f}")
print(f"MAP@10:       {eval_map:.4f}")

Dependencies

numpy - Numerical computation (sqrt, mean operations)
pandas - DataFrame merging and groupby operations
sklearn.metrics - mean_squared_error, mean_absolute_error base implementations

Related Pages

Implements Principle

Principle:Recommenders_team_Recommenders_Ranking_Rating_Evaluation

Requires Environment

Environment:Recommenders_team_Recommenders_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment