Principle:Deepset ai Haystack Retrieval MAP Evaluation

Overview

Mean Average Precision (MAP) evaluates retrieval quality by averaging precision at each relevant document position across all queries. Unlike MRR, which only considers the first relevant document, MAP rewards systems that rank all relevant documents highly.

Domains

Evaluation
Information_Retrieval

Theoretical Foundation

MAP is defined as the mean of the Average Precision (AP) scores across all queries:

MAP = (1/Q) * sum(AP_i) for i = 1..Q

AP = (1/R) * sum(P(k) * rel(k)) for k = 1..n

Where:

Q is the total number of queries.
AP_i is the Average Precision for query i.
R is the number of relevant documents for the query.
P(k) is the precision at position k in the ranked list.
rel(k) is a binary indicator (1 if the document at rank k is relevant, 0 otherwise).
n is the total number of retrieved documents.

Worked Example

Consider a query with 2 relevant documents. The retrieved list has 3 documents:

Rank 1: relevant (precision = 1/1 = 1.0)
Rank 2: not relevant
Rank 3: relevant (precision = 2/3 = 0.667)

AP = (1/2) * (1.0 + 0.667) = 0.833

Key Properties

Considers full ranking: Unlike MRR, MAP accounts for the positions of all relevant documents, not just the first.
Score range: MAP scores range from 0.0 to 1.0. A score of 1.0 means all relevant documents are ranked at the top of every result list.
Precision-oriented: MAP incorporates precision at each relevant position, penalizing systems that intersperse irrelevant results among relevant ones.

When to Use MAP

MAP is ideal for scenarios where retrieving all relevant documents matters:

Multi-document question answering: Where multiple passages contribute to the answer.
Comprehensive retrieval: Where missing a relevant document impacts downstream quality.
RAG evaluation: Where the quality of the full context window depends on ranking all relevant documents highly.

Limitations

Assumes binary relevance (relevant or not). Does not handle graded relevance scores.
Requires knowledge of all relevant documents (the ground truth set must be complete).
Can be dominated by queries with many relevant documents.

Relationship to Implementation

In the Haystack framework, this principle is realized by the DocumentMAPEvaluator component, which:

Accepts lists of ground truth documents and retrieved documents per query.
Computes Average Precision for each query based on content matching.
Returns both individual per-query AP scores and the aggregated MAP score.

Related Principles

Retrieval MRR Evaluation -- focuses only on the first relevant document's rank.
Retrieval Recall Evaluation -- measures the proportion of relevant documents found, regardless of rank.

References

Pinecone: Offline Evaluation Metrics
Manning, C. D., Raghavan, P., & Schutze, H. (2008). "Introduction to Information Retrieval." Cambridge University Press.

Related Pages

Implemented By

Implementation:Deepset_ai_Haystack_DocumentMAPEvaluator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment