Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Deepset ai Haystack DocumentRecallEvaluator

From Leeroopedia

Overview

DocumentRecallEvaluator is a Haystack evaluator component that calculates the Recall score for retrieved documents. It supports two modes: single hit (any relevant document found = 1.0) and multi hit (proportion of relevant documents found).

Implements Principle

Principle:Deepset_ai_Haystack_Retrieval_Recall_Evaluation

Source Location

haystack/components/evaluators/document_recall.py (Lines 41-145)

Import

from haystack.components.evaluators import DocumentRecallEvaluator

The RecallMode enum is also available:

from haystack.components.evaluators.document_recall import RecallMode

Component Registration

DocumentRecallEvaluator is decorated with @component, making it a standard Haystack pipeline component.

API

Constructor

def __init__(self, mode: str | RecallMode = RecallMode.SINGLE_HIT):

Parameters:

  • mode (str | RecallMode, default: RecallMode.SINGLE_HIT) -- The mode for calculating recall. Accepts either a RecallMode enum value or a string ("single_hit" or "multi_hit").

RecallMode Enum

class RecallMode(Enum):
    SINGLE_HIT = "single_hit"   # Score is 1.0 if any relevant document is retrieved
    MULTI_HIT = "multi_hit"     # Score is the proportion of relevant documents retrieved

run()

def run(
    self,
    ground_truth_documents: list[list[Document]],
    retrieved_documents: list[list[Document]]
) -> dict[str, Any]:

Parameters:

  • ground_truth_documents (list[list[Document]]) -- A list of expected documents for each question.
  • retrieved_documents (list[list[Document]]) -- A list of retrieved documents for each question.

Returns: A dictionary with the following keys:

  • score (float) -- The average recall score across all queries.
  • individual_scores (list[float]) -- A list of recall scores for each query. In single hit mode, values are 0 or 1. In multi hit mode, values range from 0.0 to 1.0.

Raises:

  • ValueError -- If ground_truth_documents and retrieved_documents have different lengths.

to_dict()

def to_dict(self) -> dict[str, Any]:

Serializes the component to a dictionary, including the recall mode.

Algorithm

Single Hit Mode

For each query:

  1. Extract unique content from ground truth documents and retrieved documents.
  2. Compute the intersection of the two sets.
  3. Return 1.0 if the intersection is non-empty, else 0.0.

Multi Hit Mode

For each query:

  1. Extract unique content from ground truth documents and retrieved documents.
  2. Compute the intersection of the two sets.
  3. Return |intersection| / |unique_truths|.
  4. Returns 0.0 if ground truth or retrieved sets are empty or contain only empty strings.

Usage Example

from haystack import Document
from haystack.components.evaluators import DocumentRecallEvaluator

# Single hit mode (default)
evaluator = DocumentRecallEvaluator()
result = evaluator.run(
    ground_truth_documents=[
        [Document(content="France")],
        [Document(content="9th century"), Document(content="9th")],
    ],
    retrieved_documents=[
        [Document(content="France")],
        [Document(content="9th century"), Document(content="10th century"), Document(content="9th")],
    ],
)
print(result["individual_scores"])
# [1.0, 1.0]
print(result["score"])
# 1.0

Multi Hit Mode Example

from haystack import Document
from haystack.components.evaluators import DocumentRecallEvaluator

evaluator = DocumentRecallEvaluator(mode="multi_hit")
result = evaluator.run(
    ground_truth_documents=[
        [Document(content="Paris"), Document(content="France")],
    ],
    retrieved_documents=[
        [Document(content="Paris"), Document(content="Berlin")],
    ],
)
print(result["individual_scores"])
# [0.5]  -- only 1 of 2 relevant documents was found
print(result["score"])
# 0.5

Integration in Evaluation Pipelines

DocumentRecallEvaluator can be combined with other evaluators:

from haystack import Pipeline
from haystack.components.evaluators import DocumentRecallEvaluator, DocumentMRREvaluator

eval_pipeline = Pipeline()
eval_pipeline.add_component("recall_single", DocumentRecallEvaluator(mode="single_hit"))
eval_pipeline.add_component("recall_multi", DocumentRecallEvaluator(mode="multi_hit"))
eval_pipeline.add_component("mrr", DocumentMRREvaluator())

results = eval_pipeline.run({
    "recall_single": {
        "ground_truth_documents": ground_truths,
        "retrieved_documents": retrieved_docs,
    },
    "recall_multi": {
        "ground_truth_documents": ground_truths,
        "retrieved_documents": retrieved_docs,
    },
    "mrr": {
        "ground_truth_documents": ground_truths,
        "retrieved_documents": retrieved_docs,
    },
})

Important Notes

  • Set-based comparison: Recall operates on unique content values. Duplicate documents do not affect the score.
  • Edge case handling: In multi hit mode, empty ground truth or retrieved sets (or sets containing only empty strings) result in a warning log and a score of 0.0.
  • Serializable: The component can be serialized and deserialized via to_dict(), preserving the recall mode.
  • Deterministic: The evaluator is fully deterministic and requires no external services or models.

Dependencies

  • haystack core library (Document, component decorator, default_to_dict)
  • No external dependencies required.

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment