Implementation:Explodinggradients Ragas Collections ContextRelevance Metric
| Field | Value |
|---|---|
| source | Repo |
| domains | Metrics, Evaluation |
| last_updated | 2026-02-10 00:00 GMT |
Overview
ContextRelevance is a v2 class-based metric that evaluates whether retrieved contexts are pertinent to the user input using a dual-judge LLM evaluation system.
Description
ContextRelevance extends BaseMetric and requires a modern InstructorBaseRagasLLM. The metric implements NVIDIA's dual-judge approach for robust evaluation:
- Judge 1 evaluates context relevance using
ContextRelevanceJudge1Prompt. - Judge 2 provides an alternative perspective using
ContextRelevanceJudge2Prompt. - The final score is the average of both judges, converted from a 0/1/2 integer rating scale to a 0.0--1.0 float scale.
Rating interpretation: 0 = not relevant, 1 = partially relevant, 2 = fully relevant. Each judge has built-in retry logic (configurable via max_retries, default 5) to handle invalid ratings or LLM failures. If a judge fails after all retries, it returns NaN, and the averaging logic gracefully falls back to the other judge's score.
Edge cases handled: empty inputs, user input matching context exactly, and context being a substring of user input all return 0.0.
Usage
Instantiate with a required llm parameter and optional max_retries. Call ascore(user_input, retrieved_contexts). The retrieved contexts are concatenated with newline separators before evaluation.
Code Reference
| Property | Value |
|---|---|
| Source Location | src/ragas/metrics/collections/context_relevance/metric.py L1--182
|
| Signature | class ContextRelevance(BaseMetric)
|
| Import | from ragas.metrics.collections import ContextRelevance
|
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
user_input |
str |
Yes | The original question |
retrieved_contexts |
List[str] |
Yes | Retrieved contexts to evaluate for relevance |
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
llm |
InstructorBaseRagasLLM |
(required) | Modern instructor-based LLM for dual-judge evaluation |
name |
str |
"context_relevance" |
Metric name |
max_retries |
int |
5 |
Maximum retry attempts per judge for invalid ratings |
Outputs
| Field | Type | Description |
|---|---|---|
MetricResult.value |
float |
Context relevance score in range 0.0--1.0 (higher is better) |
Usage Examples
from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import ContextRelevance
# Setup
client = AsyncOpenAI()
llm = llm_factory("openai", client=client, model="gpt-4o")
# Create metric
metric = ContextRelevance(llm=llm)
# Single evaluation
result = await metric.ascore(
user_input="When was Einstein born?",
retrieved_contexts=[
"Albert Einstein was born March 14, 1879 in Ulm, Germany.",
"Einstein developed the theory of relativity.",
]
)
print(f"Context Relevance: {result.value}")
# With custom retry count
robust_metric = ContextRelevance(llm=llm, max_retries=10)
result = await robust_metric.ascore(
user_input="What causes rain?",
retrieved_contexts=["Water evaporates and condenses in clouds."]
)
print(f"Relevance: {result.value}")
# Batch evaluation
results = await metric.abatch_score([
{
"user_input": "What is Python?",
"retrieved_contexts": ["Python is a programming language."],
},
{
"user_input": "What is Java?",
"retrieved_contexts": ["Java is an island in Indonesia."],
},
])
Related Pages
- Explodinggradients_Ragas_Collections_BaseMetric_Class -- Base class for all v2 metrics
- Explodinggradients_Ragas_Collections_ContextPrecision_Metric -- Context precision with average precision scoring
- Explodinggradients_Ragas_Collections_ResponseGroundedness_Metric -- Response groundedness with dual-judge (same architecture)
- Explodinggradients_Ragas_Collections_AnswerRelevancy_Metric -- Answer relevancy evaluation