Implementation:Explodinggradients Ragas Collections ContextRelevance Metric

Field	Value
source	Repo
domains	Metrics, Evaluation
last_updated	2026-02-10 00:00 GMT

Overview

ContextRelevance is a v2 class-based metric that evaluates whether retrieved contexts are pertinent to the user input using a dual-judge LLM evaluation system.

Description

ContextRelevance extends BaseMetric and requires a modern InstructorBaseRagasLLM. The metric implements NVIDIA's dual-judge approach for robust evaluation:

Judge 1 evaluates context relevance using ContextRelevanceJudge1Prompt.
Judge 2 provides an alternative perspective using ContextRelevanceJudge2Prompt.
The final score is the average of both judges, converted from a 0/1/2 integer rating scale to a 0.0--1.0 float scale.

Rating interpretation: 0 = not relevant, 1 = partially relevant, 2 = fully relevant. Each judge has built-in retry logic (configurable via max_retries, default 5) to handle invalid ratings or LLM failures. If a judge fails after all retries, it returns NaN, and the averaging logic gracefully falls back to the other judge's score.

Edge cases handled: empty inputs, user input matching context exactly, and context being a substring of user input all return 0.0.

Usage

Instantiate with a required llm parameter and optional max_retries. Call ascore(user_input, retrieved_contexts). The retrieved contexts are concatenated with newline separators before evaluation.

Code Reference

Property	Value
Source Location	`src/ragas/metrics/collections/context_relevance/metric.py` L1--182
Signature	`class ContextRelevance(BaseMetric)`
Import	`from ragas.metrics.collections import ContextRelevance`

I/O Contract

Inputs

Parameter	Type	Required	Description
`user_input`	`str`	Yes	The original question
`retrieved_contexts`	`List[str]`	Yes	Retrieved contexts to evaluate for relevance

Constructor Parameters

Parameter	Type	Default	Description
`llm`	`InstructorBaseRagasLLM`	(required)	Modern instructor-based LLM for dual-judge evaluation
`name`	`str`	`"context_relevance"`	Metric name
`max_retries`	`int`	`5`	Maximum retry attempts per judge for invalid ratings

Outputs

Field	Type	Description
`MetricResult.value`	`float`	Context relevance score in range 0.0--1.0 (higher is better)

Usage Examples

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import ContextRelevance

# Setup
client = AsyncOpenAI()
llm = llm_factory("openai", client=client, model="gpt-4o")

# Create metric
metric = ContextRelevance(llm=llm)

# Single evaluation
result = await metric.ascore(
    user_input="When was Einstein born?",
    retrieved_contexts=[
        "Albert Einstein was born March 14, 1879 in Ulm, Germany.",
        "Einstein developed the theory of relativity.",
    ]
)
print(f"Context Relevance: {result.value}")

# With custom retry count
robust_metric = ContextRelevance(llm=llm, max_retries=10)
result = await robust_metric.ascore(
    user_input="What causes rain?",
    retrieved_contexts=["Water evaporates and condenses in clouds."]
)
print(f"Relevance: {result.value}")

# Batch evaluation
results = await metric.abatch_score([
    {
        "user_input": "What is Python?",
        "retrieved_contexts": ["Python is a programming language."],
    },
    {
        "user_input": "What is Java?",
        "retrieved_contexts": ["Java is an island in Indonesia."],
    },
])

Related Pages

Explodinggradients_Ragas_Collections_BaseMetric_Class -- Base class for all v2 metrics
Explodinggradients_Ragas_Collections_ContextPrecision_Metric -- Context precision with average precision scoring
Explodinggradients_Ragas_Collections_ResponseGroundedness_Metric -- Response groundedness with dual-judge (same architecture)
Explodinggradients_Ragas_Collections_AnswerRelevancy_Metric -- Answer relevancy evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment