Implementation:Vibrantlabsai Ragas ContextRecall
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
ContextRecall measures whether the retrieved contexts contain sufficient information to support each statement in the ground truth reference, offering both LLM-based and non-LLM variants.
Description
This module provides four distinct implementations of context recall, each suited to different evaluation scenarios:
LLMContextRecall (the primary implementation) uses an LLM to classify each sentence in the ground truth reference as either attributed (1) or not attributed (0) to the retrieved contexts. The score is computed as the fraction of attributed sentences out of all sentences. The LLM receives the user question, concatenated retrieved contexts, and the reference answer, then performs a sentence-level Natural Language Inference (NLI) classification. Results from multiple LLM generations are ensembled using discrete ensembling on the "attributed" field.
ContextRecall is a simple subclass of LLMContextRecall that inherits all behavior unchanged.
NonLLMContextRecall uses string similarity measures (via NonLLMStringSimilarity) instead of an LLM. For each reference context, it computes the maximum similarity score against all retrieved contexts. If the maximum similarity exceeds a configurable threshold (default 0.5), the reference context is considered recalled. The final score is the fraction of reference contexts that exceed the threshold.
IDBasedContextRecall performs a direct set-based comparison of retrieved context IDs against reference context IDs. It converts all IDs to strings for consistent comparison and computes recall as the fraction of reference IDs found in the retrieved set.
Usage
Use LLMContextRecall or ContextRecall when you have a reference answer and want to evaluate how well the retrieved contexts support each statement in that answer. Use NonLLMContextRecall when you have reference contexts and want a fast, LLM-free evaluation based on string distance. Use IDBasedContextRecall when contexts have unique identifiers and you simply need to verify retrieval coverage.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_context_recall.py
Signature
@dataclass
class LLMContextRecall(MetricWithLLM, SingleTurnMetric):
name: str = "context_recall"
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {
MetricType.SINGLE_TURN: {
"user_input",
"retrieved_contexts",
"reference",
}
}
)
output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
context_recall_prompt: PydanticPrompt = field(
default_factory=ContextRecallClassificationPrompt
)
max_retries: int = 1
@dataclass
class ContextRecall(LLMContextRecall):
name: str = "context_recall"
@dataclass
class NonLLMContextRecall(SingleTurnMetric):
name: str = "non_llm_context_recall"
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {
MetricType.SINGLE_TURN: {
"retrieved_contexts",
"reference_contexts",
}
}
)
output_type: MetricOutputType = MetricOutputType.CONTINUOUS
_distance_measure: SingleTurnMetric = field(
default_factory=lambda: NonLLMStringSimilarity()
)
threshold: float = 0.5
@dataclass
class IDBasedContextRecall(SingleTurnMetric):
name: str = "id_based_context_recall"
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {
MetricType.SINGLE_TURN: {
"retrieved_context_ids",
"reference_context_ids",
}
}
)
output_type: MetricOutputType = MetricOutputType.CONTINUOUS
Import
from ragas.metrics import ContextRecall
from ragas.metrics import LLMContextRecall
from ragas.metrics import NonLLMContextRecall
from ragas.metrics import IDBasedContextRecall
I/O Contract
Inputs (LLMContextRecall / ContextRecall)
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str | Yes | The user's question or query |
| retrieved_contexts | List[str] | Yes | The list of retrieved context strings |
| reference | str | Yes | The ground truth reference answer whose sentences are classified |
Inputs (NonLLMContextRecall)
| Name | Type | Required | Description |
|---|---|---|---|
| retrieved_contexts | List[str] | Yes | The list of retrieved context strings |
| reference_contexts | List[str] | Yes | The list of ground truth reference contexts to compare against |
Inputs (IDBasedContextRecall)
| Name | Type | Required | Description |
|---|---|---|---|
| retrieved_context_ids | List[Union[str, int]] | Yes | IDs of retrieved contexts |
| reference_context_ids | List[Union[str, int]] | Yes | IDs of ground truth reference contexts |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | A continuous score between 0 and 1 representing the fraction of reference items recalled by the retrieval. Returns NaN if no valid classification is produced or no reference items are provided. |
Key Components
Prompt and Models
| Class | Description |
|---|---|
| QCA | Pydantic model holding question, context, and answer fields for the classification prompt input |
| ContextRecallClassification | Pydantic model for a single statement classification containing statement text, reason, and attributed flag (0 or 1) |
| ContextRecallClassifications | Pydantic model wrapping a list of ContextRecallClassification items |
| ContextRecallClassificationPrompt | PydanticPrompt that takes QCA input and produces ContextRecallClassifications; includes a detailed example based on Albert Einstein |
Ensembling
The LLMContextRecall metric uses generate_multiple to produce multiple classification outputs from the LLM, then applies ensembler.from_discrete on the "attributed" field to combine results. This ensemble approach improves robustness of the classification.
Usage Examples
Basic Usage with LLM
from ragas.metrics import ContextRecall
from ragas.dataset_schema import SingleTurnSample
metric = ContextRecall()
# metric.llm = your_llm_instance
sample = SingleTurnSample(
user_input="What can you tell me about Albert Einstein?",
retrieved_contexts=[
"Albert Einstein was a German-born theoretical physicist.",
"He developed the theory of relativity."
],
reference="Albert Einstein was a German-born theoretical physicist who developed the theory of relativity."
)
# score = await metric.single_turn_ascore(sample)
Non-LLM Context Recall
from ragas.metrics import NonLLMContextRecall
from ragas.dataset_schema import SingleTurnSample
metric = NonLLMContextRecall(threshold=0.5)
sample = SingleTurnSample(
retrieved_contexts=["Einstein developed relativity theory."],
reference_contexts=["Albert Einstein developed the theory of relativity."]
)
# score = await metric.single_turn_ascore(sample)
ID-Based Context Recall
from ragas.metrics import IDBasedContextRecall
from ragas.dataset_schema import SingleTurnSample
metric = IDBasedContextRecall()
sample = SingleTurnSample(
retrieved_context_ids=["doc_1", "doc_3", "doc_5"],
reference_context_ids=["doc_1", "doc_2", "doc_3"]
)
# score = await metric.single_turn_ascore(sample)
# Expected: 2/3 = 0.667 (doc_1 and doc_3 are found)