Implementation:Deepset ai Haystack DocumentRecallEvaluator
Overview
DocumentRecallEvaluator is a Haystack evaluator component that calculates the Recall score for retrieved documents. It supports two modes: single hit (any relevant document found = 1.0) and multi hit (proportion of relevant documents found).
Implements Principle
Principle:Deepset_ai_Haystack_Retrieval_Recall_Evaluation
Source Location
haystack/components/evaluators/document_recall.py (Lines 41-145)
Import
from haystack.components.evaluators import DocumentRecallEvaluator
The RecallMode enum is also available:
from haystack.components.evaluators.document_recall import RecallMode
Component Registration
DocumentRecallEvaluator is decorated with @component, making it a standard Haystack pipeline component.
API
Constructor
def __init__(self, mode: str | RecallMode = RecallMode.SINGLE_HIT):
Parameters:
- mode (
str | RecallMode, default:RecallMode.SINGLE_HIT) -- The mode for calculating recall. Accepts either aRecallModeenum value or a string ("single_hit"or"multi_hit").
RecallMode Enum
class RecallMode(Enum):
SINGLE_HIT = "single_hit" # Score is 1.0 if any relevant document is retrieved
MULTI_HIT = "multi_hit" # Score is the proportion of relevant documents retrieved
run()
def run(
self,
ground_truth_documents: list[list[Document]],
retrieved_documents: list[list[Document]]
) -> dict[str, Any]:
Parameters:
- ground_truth_documents (
list[list[Document]]) -- A list of expected documents for each question. - retrieved_documents (
list[list[Document]]) -- A list of retrieved documents for each question.
Returns: A dictionary with the following keys:
- score (
float) -- The average recall score across all queries. - individual_scores (
list[float]) -- A list of recall scores for each query. In single hit mode, values are 0 or 1. In multi hit mode, values range from 0.0 to 1.0.
Raises:
ValueError-- Ifground_truth_documentsandretrieved_documentshave different lengths.
to_dict()
def to_dict(self) -> dict[str, Any]:
Serializes the component to a dictionary, including the recall mode.
Algorithm
Single Hit Mode
For each query:
- Extract unique content from ground truth documents and retrieved documents.
- Compute the intersection of the two sets.
- Return 1.0 if the intersection is non-empty, else 0.0.
Multi Hit Mode
For each query:
- Extract unique content from ground truth documents and retrieved documents.
- Compute the intersection of the two sets.
- Return
|intersection| / |unique_truths|. - Returns 0.0 if ground truth or retrieved sets are empty or contain only empty strings.
Usage Example
from haystack import Document
from haystack.components.evaluators import DocumentRecallEvaluator
# Single hit mode (default)
evaluator = DocumentRecallEvaluator()
result = evaluator.run(
ground_truth_documents=[
[Document(content="France")],
[Document(content="9th century"), Document(content="9th")],
],
retrieved_documents=[
[Document(content="France")],
[Document(content="9th century"), Document(content="10th century"), Document(content="9th")],
],
)
print(result["individual_scores"])
# [1.0, 1.0]
print(result["score"])
# 1.0
Multi Hit Mode Example
from haystack import Document
from haystack.components.evaluators import DocumentRecallEvaluator
evaluator = DocumentRecallEvaluator(mode="multi_hit")
result = evaluator.run(
ground_truth_documents=[
[Document(content="Paris"), Document(content="France")],
],
retrieved_documents=[
[Document(content="Paris"), Document(content="Berlin")],
],
)
print(result["individual_scores"])
# [0.5] -- only 1 of 2 relevant documents was found
print(result["score"])
# 0.5
Integration in Evaluation Pipelines
DocumentRecallEvaluator can be combined with other evaluators:
from haystack import Pipeline
from haystack.components.evaluators import DocumentRecallEvaluator, DocumentMRREvaluator
eval_pipeline = Pipeline()
eval_pipeline.add_component("recall_single", DocumentRecallEvaluator(mode="single_hit"))
eval_pipeline.add_component("recall_multi", DocumentRecallEvaluator(mode="multi_hit"))
eval_pipeline.add_component("mrr", DocumentMRREvaluator())
results = eval_pipeline.run({
"recall_single": {
"ground_truth_documents": ground_truths,
"retrieved_documents": retrieved_docs,
},
"recall_multi": {
"ground_truth_documents": ground_truths,
"retrieved_documents": retrieved_docs,
},
"mrr": {
"ground_truth_documents": ground_truths,
"retrieved_documents": retrieved_docs,
},
})
Important Notes
- Set-based comparison: Recall operates on unique content values. Duplicate documents do not affect the score.
- Edge case handling: In multi hit mode, empty ground truth or retrieved sets (or sets containing only empty strings) result in a warning log and a score of 0.0.
- Serializable: The component can be serialized and deserialized via
to_dict(), preserving the recall mode. - Deterministic: The evaluator is fully deterministic and requires no external services or models.
Dependencies
haystackcore library (Document, component decorator, default_to_dict)- No external dependencies required.