Principle:Deepset ai Haystack Retrieval MRR Evaluation
Overview
Mean Reciprocal Rank (MRR) measures retrieval quality by the position of the first relevant document in a ranked list of results. It is one of the most widely used metrics in information retrieval evaluation, rewarding systems that place relevant documents early in the result set.
Domains
- Evaluation
- Information_Retrieval
Theoretical Foundation
MRR is defined as the average of the reciprocal ranks across all queries:
MRR = (1/Q) * sum(1/rank_i) for i = 1..Q
Where:
- Q is the total number of queries evaluated.
- rank_i is the position (1-indexed) of the first relevant document in the retrieved results for query i.
- If no relevant document is found in the retrieved results for a given query, the reciprocal rank for that query is 0.0.
Key Properties
- Focus on first relevant result: MRR only considers the rank of the first relevant document. It does not account for the positions of additional relevant documents in the result set.
- Score range: MRR scores range from 0.0 to 1.0. A score of 1.0 means every query had its first relevant document at rank 1.
- Sensitivity to top positions: The metric is heavily weighted toward top positions. A relevant document at rank 1 contributes 1.0, at rank 2 contributes 0.5, at rank 3 contributes 0.333, and so on.
When to Use MRR
MRR is ideal for scenarios where the user is primarily interested in finding one correct result as quickly as possible:
- Question answering: Where a single correct passage suffices.
- Navigational queries: Where the user seeks one specific document.
- Retrieval-Augmented Generation (RAG): Where finding at least one relevant context document early is critical for answer quality.
Limitations
- Does not reward finding multiple relevant documents. For that, use MAP or Recall.
- Does not differentiate between finding the second, third, or fourth relevant document. Only the first matters.
- Sensitive to the definition of "relevance" -- uses content-based matching, so documents must be normalized consistently.
Relationship to Implementation
In the Haystack framework, this principle is realized by the DocumentMRREvaluator component, which:
- Accepts lists of ground truth documents and retrieved documents per query.
- Computes the reciprocal rank for each query based on content matching.
- Returns both individual per-query scores and the aggregated MRR score.
Related Principles
- Retrieval MAP Evaluation -- considers the full ranking of all relevant documents.
- Retrieval Recall Evaluation -- measures the proportion of relevant documents retrieved.
References
- Pinecone: Offline Evaluation Metrics
- Voorhees, E. M. (1999). "The TREC-8 Question Answering Track Report." TREC.