Principle:Deepset ai Haystack Retrieval MRR Evaluation

Overview

Mean Reciprocal Rank (MRR) measures retrieval quality by the position of the first relevant document in a ranked list of results. It is one of the most widely used metrics in information retrieval evaluation, rewarding systems that place relevant documents early in the result set.

Domains

Evaluation
Information_Retrieval

Theoretical Foundation

MRR is defined as the average of the reciprocal ranks across all queries:

MRR = (1/Q) * sum(1/rank_i) for i = 1..Q

Where:

Q is the total number of queries evaluated.
rank_i is the position (1-indexed) of the first relevant document in the retrieved results for query i.
If no relevant document is found in the retrieved results for a given query, the reciprocal rank for that query is 0.0.

Key Properties

Focus on first relevant result: MRR only considers the rank of the first relevant document. It does not account for the positions of additional relevant documents in the result set.
Score range: MRR scores range from 0.0 to 1.0. A score of 1.0 means every query had its first relevant document at rank 1.
Sensitivity to top positions: The metric is heavily weighted toward top positions. A relevant document at rank 1 contributes 1.0, at rank 2 contributes 0.5, at rank 3 contributes 0.333, and so on.

When to Use MRR

MRR is ideal for scenarios where the user is primarily interested in finding one correct result as quickly as possible:

Question answering: Where a single correct passage suffices.
Navigational queries: Where the user seeks one specific document.
Retrieval-Augmented Generation (RAG): Where finding at least one relevant context document early is critical for answer quality.

Limitations

Does not reward finding multiple relevant documents. For that, use MAP or Recall.
Does not differentiate between finding the second, third, or fourth relevant document. Only the first matters.
Sensitive to the definition of "relevance" -- uses content-based matching, so documents must be normalized consistently.

Relationship to Implementation

In the Haystack framework, this principle is realized by the DocumentMRREvaluator component, which:

Accepts lists of ground truth documents and retrieved documents per query.
Computes the reciprocal rank for each query based on content matching.
Returns both individual per-query scores and the aggregated MRR score.

Related Principles

Retrieval MAP Evaluation -- considers the full ranking of all relevant documents.
Retrieval Recall Evaluation -- measures the proportion of relevant documents retrieved.

References

Pinecone: Offline Evaluation Metrics
Voorhees, E. M. (1999). "The TREC-8 Question Answering Track Report." TREC.

Related Pages

Implemented By

Implementation:Deepset_ai_Haystack_DocumentMRREvaluator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment