Implementation:Vibrantlabsai Ragas ContextRecall

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

ContextRecall measures whether the retrieved contexts contain sufficient information to support each statement in the ground truth reference, offering both LLM-based and non-LLM variants.

Description

This module provides four distinct implementations of context recall, each suited to different evaluation scenarios:

LLMContextRecall (the primary implementation) uses an LLM to classify each sentence in the ground truth reference as either attributed (1) or not attributed (0) to the retrieved contexts. The score is computed as the fraction of attributed sentences out of all sentences. The LLM receives the user question, concatenated retrieved contexts, and the reference answer, then performs a sentence-level Natural Language Inference (NLI) classification. Results from multiple LLM generations are ensembled using discrete ensembling on the "attributed" field.

ContextRecall is a simple subclass of LLMContextRecall that inherits all behavior unchanged.

NonLLMContextRecall uses string similarity measures (via NonLLMStringSimilarity) instead of an LLM. For each reference context, it computes the maximum similarity score against all retrieved contexts. If the maximum similarity exceeds a configurable threshold (default 0.5), the reference context is considered recalled. The final score is the fraction of reference contexts that exceed the threshold.

IDBasedContextRecall performs a direct set-based comparison of retrieved context IDs against reference context IDs. It converts all IDs to strings for consistent comparison and computes recall as the fraction of reference IDs found in the retrieved set.

Usage

Use LLMContextRecall or ContextRecall when you have a reference answer and want to evaluate how well the retrieved contexts support each statement in that answer. Use NonLLMContextRecall when you have reference contexts and want a fast, LLM-free evaluation based on string distance. Use IDBasedContextRecall when contexts have unique identifiers and you simply need to verify retrieval coverage.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_context_recall.py

Signature

@dataclass
class LLMContextRecall(MetricWithLLM, SingleTurnMetric):
    name: str = "context_recall"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "user_input",
                "retrieved_contexts",
                "reference",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    context_recall_prompt: PydanticPrompt = field(
        default_factory=ContextRecallClassificationPrompt
    )
    max_retries: int = 1

@dataclass
class ContextRecall(LLMContextRecall):
    name: str = "context_recall"

@dataclass
class NonLLMContextRecall(SingleTurnMetric):
    name: str = "non_llm_context_recall"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "retrieved_contexts",
                "reference_contexts",
            }
        }
    )
    output_type: MetricOutputType = MetricOutputType.CONTINUOUS
    _distance_measure: SingleTurnMetric = field(
        default_factory=lambda: NonLLMStringSimilarity()
    )
    threshold: float = 0.5

@dataclass
class IDBasedContextRecall(SingleTurnMetric):
    name: str = "id_based_context_recall"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "retrieved_context_ids",
                "reference_context_ids",
            }
        }
    )
    output_type: MetricOutputType = MetricOutputType.CONTINUOUS

Import

from ragas.metrics import ContextRecall
from ragas.metrics import LLMContextRecall
from ragas.metrics import NonLLMContextRecall
from ragas.metrics import IDBasedContextRecall

I/O Contract

Inputs (LLMContextRecall / ContextRecall)

Name	Type	Required	Description
user_input	str	Yes	The user's question or query
retrieved_contexts	List[str]	Yes	The list of retrieved context strings
reference	str	Yes	The ground truth reference answer whose sentences are classified

Inputs (NonLLMContextRecall)

Name	Type	Required	Description
retrieved_contexts	List[str]	Yes	The list of retrieved context strings
reference_contexts	List[str]	Yes	The list of ground truth reference contexts to compare against

Inputs (IDBasedContextRecall)

Name	Type	Required	Description
retrieved_context_ids	List[Union[str, int]]	Yes	IDs of retrieved contexts
reference_context_ids	List[Union[str, int]]	Yes	IDs of ground truth reference contexts

Outputs

Name	Type	Description
score	float	A continuous score between 0 and 1 representing the fraction of reference items recalled by the retrieval. Returns NaN if no valid classification is produced or no reference items are provided.

Key Components

Prompt and Models

Class	Description
QCA	Pydantic model holding question, context, and answer fields for the classification prompt input
ContextRecallClassification	Pydantic model for a single statement classification containing statement text, reason, and attributed flag (0 or 1)
ContextRecallClassifications	Pydantic model wrapping a list of ContextRecallClassification items
ContextRecallClassificationPrompt	PydanticPrompt that takes QCA input and produces ContextRecallClassifications; includes a detailed example based on Albert Einstein

Ensembling

The LLMContextRecall metric uses generate_multiple to produce multiple classification outputs from the LLM, then applies ensembler.from_discrete on the "attributed" field to combine results. This ensemble approach improves robustness of the classification.

Usage Examples

Basic Usage with LLM

from ragas.metrics import ContextRecall
from ragas.dataset_schema import SingleTurnSample

metric = ContextRecall()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    user_input="What can you tell me about Albert Einstein?",
    retrieved_contexts=[
        "Albert Einstein was a German-born theoretical physicist.",
        "He developed the theory of relativity."
    ],
    reference="Albert Einstein was a German-born theoretical physicist who developed the theory of relativity."
)

# score = await metric.single_turn_ascore(sample)

Non-LLM Context Recall

from ragas.metrics import NonLLMContextRecall
from ragas.dataset_schema import SingleTurnSample

metric = NonLLMContextRecall(threshold=0.5)

sample = SingleTurnSample(
    retrieved_contexts=["Einstein developed relativity theory."],
    reference_contexts=["Albert Einstein developed the theory of relativity."]
)

# score = await metric.single_turn_ascore(sample)

ID-Based Context Recall

from ragas.metrics import IDBasedContextRecall
from ragas.dataset_schema import SingleTurnSample

metric = IDBasedContextRecall()

sample = SingleTurnSample(
    retrieved_context_ids=["doc_1", "doc_3", "doc_5"],
    reference_context_ids=["doc_1", "doc_2", "doc_3"]
)

# score = await metric.single_turn_ascore(sample)
# Expected: 2/3 = 0.667 (doc_1 and doc_3 are found)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment