Principle:Recommenders team Recommenders Ranking Rating Evaluation

Knowledge Sources	Recommenders Evaluation Metrics for Recommender Systems
Domains	Recommender Systems, Evaluation Metrics, Information Retrieval
Last Updated	2026-02-10 00:00 GMT

Overview

Recommender system evaluation employs two complementary families of metrics: rating metrics that measure prediction accuracy (RMSE, MAE) and ranking metrics that measure the quality of the ordered recommendation list (Precision@K, Recall@K, NDCG@K, MAP).

Description

Evaluating recommender systems requires understanding what the model is being asked to do. Two distinct evaluation paradigms address different use cases:

Rating metrics assess how accurately the model predicts the exact rating a user would give an item. These are appropriate when the system is used for rating prediction (e.g., "predict that User A would rate Movie B as 4.2 stars").

Ranking metrics assess how well the model orders items for each user, focusing on whether the most relevant items appear near the top of the recommendation list. These are appropriate when the system is used for top-K recommendation (e.g., "recommend 10 movies to User A").

In practice, ranking metrics are more commonly used for evaluating collaborative filtering models because the primary use case is generating ranked recommendation lists, not predicting exact ratings.

Usage

Use rating metrics (RMSE, MAE) when:

You are predicting explicit ratings and want to measure prediction error.
You need a single aggregate measure of how far predictions deviate from ground truth.

Use ranking metrics (Precision@K, Recall@K, NDCG@K, MAP) when:

You are generating top-K recommendation lists.
You want to measure whether relevant items are ranked highly.
You need position-sensitive evaluation (NDCG) or set-overlap evaluation (Precision, Recall).

Theoretical Basis

Rating Metrics

Root Mean Squared Error (RMSE):

Measures the standard deviation of prediction errors. Penalizes large errors more heavily than small ones due to the squaring operation.

RMSE = sqrt( (1/N) * sum over all (u,i): (r_true(u,i) - r_pred(u,i))^2 )

Mean Absolute Error (MAE):

Measures the average magnitude of prediction errors without squaring, giving equal weight to all errors.

MAE = (1/N) * sum over all (u,i): |r_true(u,i) - r_pred(u,i)|

Ranking Metrics

Precision@K:

The fraction of recommended items in the top-K that are relevant (i.e., appear in the ground truth). Averaged across all users.

Precision@K(u) = |{recommended items in top-K} intersect {relevant items}| / K
Precision@K = (1/|Users|) * sum over u: Precision@K(u)

Recall@K:

The fraction of relevant items that appear in the top-K recommendations. Averaged across all users.

Recall@K(u) = |{recommended items in top-K} intersect {relevant items}| / |{relevant items}|
Recall@K = (1/|Users|) * sum over u: Recall@K(u)

Normalized Discounted Cumulative Gain (NDCG@K):

A position-sensitive metric that assigns higher value to relevant items ranked at the top. It normalizes the Discounted Cumulative Gain (DCG) by the ideal DCG (IDCG) to produce a value between 0 and 1.

DCG@K(u) = sum over rank r in 1..K: rel(r) / log(1 + r)
IDCG@K(u) = DCG@K computed on the ideal (perfect) ranking
NDCG@K(u) = DCG@K(u) / IDCG@K(u)
NDCG@K = (1/|Users|) * sum over u: NDCG@K(u)

Where rel(r) is the relevance score of the item at rank r. In binary mode, rel(r) = 1 if the item is relevant and 0 otherwise.

Mean Average Precision (MAP@K):

Averages the precision at each relevant item's rank position, then averages across all users. It rewards systems that rank relevant items early.

AP@K(u) = (1/|relevant items|) * sum over rank r in 1..K: Precision@r(u) * rel(r)
MAP@K = (1/|Users|) * sum over u: AP@K(u)

Metric Comparison

Metric	Type	Range	Position-Sensitive	Measures
RMSE	Rating	[0, +inf)	No	Prediction error magnitude (squared)
MAE	Rating	[0, +inf)	No	Prediction error magnitude (absolute)
Precision@K	Ranking	[0, 1]	No	Fraction of top-K that are relevant
Recall@K	Ranking	[0, 1]	No	Fraction of relevant items in top-K
NDCG@K	Ranking	[0, 1]	Yes	Quality of ranking order
MAP@K	Ranking	[0, 1]	Yes	Average precision across relevant items

Related Pages

Implemented By

Implementation:Recommenders_team_Recommenders_Python_Evaluation_Metrics

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment