Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas ContextRecall

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

ContextRecall measures whether the retrieved contexts contain sufficient information to support each statement in the ground truth reference, offering both LLM-based and non-LLM variants.

Description

This module provides four distinct implementations of context recall, each suited to different evaluation scenarios:

LLMContextRecall (the primary implementation) uses an LLM to classify each sentence in the ground truth reference as either attributed (1) or not attributed (0) to the retrieved contexts. The score is computed as the fraction of attributed sentences out of all sentences. The LLM receives the user question, concatenated retrieved contexts, and the reference answer, then performs a sentence-level Natural Language Inference (NLI) classification. Results from multiple LLM generations are ensembled using discrete ensembling on the "attributed" field.

ContextRecall is a simple subclass of LLMContextRecall that inherits all behavior unchanged.

NonLLMContextRecall uses string similarity measures (via NonLLMStringSimilarity) instead of an LLM. For each reference context, it computes the maximum similarity score against all retrieved contexts. If the maximum similarity exceeds a configurable threshold (default 0.5), the reference context is considered recalled. The final score is the fraction of reference contexts that exceed the threshold.

IDBasedContextRecall performs a direct set-based comparison of retrieved context IDs against reference context IDs. It converts all IDs to strings for consistent comparison and computes recall as the fraction of reference IDs found in the retrieved set.

Usage

Use LLMContextRecall or ContextRecall when you have a reference answer and want to evaluate how well the retrieved contexts support each statement in that answer. Use NonLLMContextRecall when you have reference contexts and want a fast, LLM-free evaluation based on string distance. Use IDBasedContextRecall when contexts have unique identifiers and you simply need to verify retrieval coverage.

Code Reference

Source Location

Signature

@dataclass
class LLMContextRecall(MetricWithLLM, SingleTurnMetric):
    name: str = "context_recall"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "user_input",
                "retrieved_contexts",
                "reference",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    context_recall_prompt: PydanticPrompt = field(
        default_factory=ContextRecallClassificationPrompt
    )
    max_retries: int = 1

@dataclass
class ContextRecall(LLMContextRecall):
    name: str = "context_recall"

@dataclass
class NonLLMContextRecall(SingleTurnMetric):
    name: str = "non_llm_context_recall"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "retrieved_contexts",
                "reference_contexts",
            }
        }
    )
    output_type: MetricOutputType = MetricOutputType.CONTINUOUS
    _distance_measure: SingleTurnMetric = field(
        default_factory=lambda: NonLLMStringSimilarity()
    )
    threshold: float = 0.5

@dataclass
class IDBasedContextRecall(SingleTurnMetric):
    name: str = "id_based_context_recall"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "retrieved_context_ids",
                "reference_context_ids",
            }
        }
    )
    output_type: MetricOutputType = MetricOutputType.CONTINUOUS

Import

from ragas.metrics import ContextRecall
from ragas.metrics import LLMContextRecall
from ragas.metrics import NonLLMContextRecall
from ragas.metrics import IDBasedContextRecall

I/O Contract

Inputs (LLMContextRecall / ContextRecall)

Name Type Required Description
user_input str Yes The user's question or query
retrieved_contexts List[str] Yes The list of retrieved context strings
reference str Yes The ground truth reference answer whose sentences are classified

Inputs (NonLLMContextRecall)

Name Type Required Description
retrieved_contexts List[str] Yes The list of retrieved context strings
reference_contexts List[str] Yes The list of ground truth reference contexts to compare against

Inputs (IDBasedContextRecall)

Name Type Required Description
retrieved_context_ids List[Union[str, int]] Yes IDs of retrieved contexts
reference_context_ids List[Union[str, int]] Yes IDs of ground truth reference contexts

Outputs

Name Type Description
score float A continuous score between 0 and 1 representing the fraction of reference items recalled by the retrieval. Returns NaN if no valid classification is produced or no reference items are provided.

Key Components

Prompt and Models

Class Description
QCA Pydantic model holding question, context, and answer fields for the classification prompt input
ContextRecallClassification Pydantic model for a single statement classification containing statement text, reason, and attributed flag (0 or 1)
ContextRecallClassifications Pydantic model wrapping a list of ContextRecallClassification items
ContextRecallClassificationPrompt PydanticPrompt that takes QCA input and produces ContextRecallClassifications; includes a detailed example based on Albert Einstein

Ensembling

The LLMContextRecall metric uses generate_multiple to produce multiple classification outputs from the LLM, then applies ensembler.from_discrete on the "attributed" field to combine results. This ensemble approach improves robustness of the classification.

Usage Examples

Basic Usage with LLM

from ragas.metrics import ContextRecall
from ragas.dataset_schema import SingleTurnSample

metric = ContextRecall()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    user_input="What can you tell me about Albert Einstein?",
    retrieved_contexts=[
        "Albert Einstein was a German-born theoretical physicist.",
        "He developed the theory of relativity."
    ],
    reference="Albert Einstein was a German-born theoretical physicist who developed the theory of relativity."
)

# score = await metric.single_turn_ascore(sample)

Non-LLM Context Recall

from ragas.metrics import NonLLMContextRecall
from ragas.dataset_schema import SingleTurnSample

metric = NonLLMContextRecall(threshold=0.5)

sample = SingleTurnSample(
    retrieved_contexts=["Einstein developed relativity theory."],
    reference_contexts=["Albert Einstein developed the theory of relativity."]
)

# score = await metric.single_turn_ascore(sample)

ID-Based Context Recall

from ragas.metrics import IDBasedContextRecall
from ragas.dataset_schema import SingleTurnSample

metric = IDBasedContextRecall()

sample = SingleTurnSample(
    retrieved_context_ids=["doc_1", "doc_3", "doc_5"],
    reference_context_ids=["doc_1", "doc_2", "doc_3"]
)

# score = await metric.single_turn_ascore(sample)
# Expected: 2/3 = 0.667 (doc_1 and doc_3 are found)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment