Principle:Deepset ai Haystack Retrieval Recall Evaluation
Overview
Recall measures the proportion of relevant documents that were successfully retrieved from the total set of relevant documents. It answers the question: "Of all the documents that should have been found, how many were actually found?"
Domains
- Evaluation
- Information_Retrieval
Theoretical Foundation
Recall is defined as:
Recall = |relevant ∩ retrieved| / |relevant|
Where:
- |relevant ∩ retrieved| is the number of relevant documents that appear in the retrieved set.
- |relevant| is the total number of relevant documents (ground truth).
Recall Modes
In practice, recall can be measured in two distinct modes depending on the evaluation objective:
Single Hit Mode
Score = 1.0 if |relevant ∩ retrieved| > 0, else 0.0
This binary mode returns 1.0 if any relevant document was found in the retrieved results, and 0.0 otherwise. It is useful when finding at least one relevant document is sufficient (e.g., navigational search, simple QA).
Multi Hit Mode
Score = |relevant ∩ retrieved| / |relevant|
This proportional mode returns the fraction of relevant documents that were retrieved. It is useful when finding all relevant documents matters (e.g., comprehensive search, multi-document reasoning).
Key Properties
- Score range: 0.0 to 1.0 in both modes.
- Rank-agnostic: Unlike MRR and MAP, recall does not consider the order of retrieved documents. A relevant document at rank 100 contributes the same as one at rank 1.
- Set-based: Recall operates on unique document content, avoiding double-counting duplicates.
When to Use Recall
- RAG pipeline evaluation: To ensure the retriever captures all necessary context.
- Multi-hop question answering: Where multiple documents are needed to compose an answer.
- Pipeline filtering assessment: To determine if filtering steps discard too many relevant documents.
Limitations
- Does not account for the ranking of results.
- Does not penalize retrieving many irrelevant documents (use precision or MAP for that).
- In single hit mode, cannot distinguish between retrieving 1 vs. all relevant documents.
Relationship to Implementation
In the Haystack framework, this principle is realized by the DocumentRecallEvaluator component, which:
- Supports both SINGLE_HIT and MULTI_HIT modes via the
RecallModeenum. - Accepts lists of ground truth and retrieved documents per query.
- Returns both individual per-query scores and the aggregated recall score.
Related Principles
- Retrieval MRR Evaluation -- focuses on the rank of the first relevant result.
- Retrieval MAP Evaluation -- combines precision and ranking for all relevant results.
References
- Manning, C. D., Raghavan, P., & Schutze, H. (2008). "Introduction to Information Retrieval." Cambridge University Press.