Principle:Deepset ai Haystack Retrieval Recall Evaluation

Overview

Recall measures the proportion of relevant documents that were successfully retrieved from the total set of relevant documents. It answers the question: "Of all the documents that should have been found, how many were actually found?"

Domains

Evaluation
Information_Retrieval

Theoretical Foundation

Recall is defined as:

Recall = |relevant ∩ retrieved| / |relevant|

Where:

|relevant ∩ retrieved| is the number of relevant documents that appear in the retrieved set.
|relevant| is the total number of relevant documents (ground truth).

Recall Modes

In practice, recall can be measured in two distinct modes depending on the evaluation objective:

Single Hit Mode

Score = 1.0 if |relevant ∩ retrieved| > 0, else 0.0

This binary mode returns 1.0 if any relevant document was found in the retrieved results, and 0.0 otherwise. It is useful when finding at least one relevant document is sufficient (e.g., navigational search, simple QA).

Multi Hit Mode

Score = |relevant ∩ retrieved| / |relevant|

This proportional mode returns the fraction of relevant documents that were retrieved. It is useful when finding all relevant documents matters (e.g., comprehensive search, multi-document reasoning).

Key Properties

Score range: 0.0 to 1.0 in both modes.
Rank-agnostic: Unlike MRR and MAP, recall does not consider the order of retrieved documents. A relevant document at rank 100 contributes the same as one at rank 1.
Set-based: Recall operates on unique document content, avoiding double-counting duplicates.

When to Use Recall

RAG pipeline evaluation: To ensure the retriever captures all necessary context.
Multi-hop question answering: Where multiple documents are needed to compose an answer.
Pipeline filtering assessment: To determine if filtering steps discard too many relevant documents.

Limitations

Does not account for the ranking of results.
Does not penalize retrieving many irrelevant documents (use precision or MAP for that).
In single hit mode, cannot distinguish between retrieving 1 vs. all relevant documents.

Relationship to Implementation

In the Haystack framework, this principle is realized by the DocumentRecallEvaluator component, which:

Supports both SINGLE_HIT and MULTI_HIT modes via the RecallMode enum.
Accepts lists of ground truth and retrieved documents per query.
Returns both individual per-query scores and the aggregated recall score.

Related Principles

Retrieval MRR Evaluation -- focuses on the rank of the first relevant result.
Retrieval MAP Evaluation -- combines precision and ranking for all relevant results.

References

Manning, C. D., Raghavan, P., & Schutze, H. (2008). "Introduction to Information Retrieval." Cambridge University Press.

Related Pages

Implemented By

Implementation:Deepset_ai_Haystack_DocumentRecallEvaluator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment