Principle:Recommenders team Recommenders Ranking Rating Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Recommender Systems, Evaluation Metrics, Information Retrieval |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Recommender system evaluation employs two complementary families of metrics: rating metrics that measure prediction accuracy (RMSE, MAE) and ranking metrics that measure the quality of the ordered recommendation list (Precision@K, Recall@K, NDCG@K, MAP).
Description
Evaluating recommender systems requires understanding what the model is being asked to do. Two distinct evaluation paradigms address different use cases:
Rating metrics assess how accurately the model predicts the exact rating a user would give an item. These are appropriate when the system is used for rating prediction (e.g., "predict that User A would rate Movie B as 4.2 stars").
Ranking metrics assess how well the model orders items for each user, focusing on whether the most relevant items appear near the top of the recommendation list. These are appropriate when the system is used for top-K recommendation (e.g., "recommend 10 movies to User A").
In practice, ranking metrics are more commonly used for evaluating collaborative filtering models because the primary use case is generating ranked recommendation lists, not predicting exact ratings.
Usage
Use rating metrics (RMSE, MAE) when:
- You are predicting explicit ratings and want to measure prediction error.
- You need a single aggregate measure of how far predictions deviate from ground truth.
Use ranking metrics (Precision@K, Recall@K, NDCG@K, MAP) when:
- You are generating top-K recommendation lists.
- You want to measure whether relevant items are ranked highly.
- You need position-sensitive evaluation (NDCG) or set-overlap evaluation (Precision, Recall).
Theoretical Basis
Rating Metrics
Root Mean Squared Error (RMSE):
Measures the standard deviation of prediction errors. Penalizes large errors more heavily than small ones due to the squaring operation.
RMSE = sqrt( (1/N) * sum over all (u,i): (r_true(u,i) - r_pred(u,i))^2 )
Mean Absolute Error (MAE):
Measures the average magnitude of prediction errors without squaring, giving equal weight to all errors.
MAE = (1/N) * sum over all (u,i): |r_true(u,i) - r_pred(u,i)|
Ranking Metrics
Precision@K:
The fraction of recommended items in the top-K that are relevant (i.e., appear in the ground truth). Averaged across all users.
Precision@K(u) = |{recommended items in top-K} intersect {relevant items}| / K
Precision@K = (1/|Users|) * sum over u: Precision@K(u)
Recall@K:
The fraction of relevant items that appear in the top-K recommendations. Averaged across all users.
Recall@K(u) = |{recommended items in top-K} intersect {relevant items}| / |{relevant items}|
Recall@K = (1/|Users|) * sum over u: Recall@K(u)
Normalized Discounted Cumulative Gain (NDCG@K):
A position-sensitive metric that assigns higher value to relevant items ranked at the top. It normalizes the Discounted Cumulative Gain (DCG) by the ideal DCG (IDCG) to produce a value between 0 and 1.
DCG@K(u) = sum over rank r in 1..K: rel(r) / log(1 + r)
IDCG@K(u) = DCG@K computed on the ideal (perfect) ranking
NDCG@K(u) = DCG@K(u) / IDCG@K(u)
NDCG@K = (1/|Users|) * sum over u: NDCG@K(u)
Where rel(r) is the relevance score of the item at rank r. In binary mode, rel(r) = 1 if the item is relevant and 0 otherwise.
Mean Average Precision (MAP@K):
Averages the precision at each relevant item's rank position, then averages across all users. It rewards systems that rank relevant items early.
AP@K(u) = (1/|relevant items|) * sum over rank r in 1..K: Precision@r(u) * rel(r)
MAP@K = (1/|Users|) * sum over u: AP@K(u)
Metric Comparison
| Metric | Type | Range | Position-Sensitive | Measures |
|---|---|---|---|---|
| RMSE | Rating | [0, +inf) | No | Prediction error magnitude (squared) |
| MAE | Rating | [0, +inf) | No | Prediction error magnitude (absolute) |
| Precision@K | Ranking | [0, 1] | No | Fraction of top-K that are relevant |
| Recall@K | Ranking | [0, 1] | No | Fraction of relevant items in top-K |
| NDCG@K | Ranking | [0, 1] | Yes | Quality of ranking order |
| MAP@K | Ranking | [0, 1] | Yes | Average precision across relevant items |