Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas Collections ContextRelevance Metric

From Leeroopedia


Field Value
source Repo
domains Metrics, Evaluation
last_updated 2026-02-10 00:00 GMT

Overview

ContextRelevance is a v2 class-based metric that evaluates whether retrieved contexts are pertinent to the user input using a dual-judge LLM evaluation system.

Description

ContextRelevance extends BaseMetric and requires a modern InstructorBaseRagasLLM. The metric implements NVIDIA's dual-judge approach for robust evaluation:

  1. Judge 1 evaluates context relevance using ContextRelevanceJudge1Prompt.
  2. Judge 2 provides an alternative perspective using ContextRelevanceJudge2Prompt.
  3. The final score is the average of both judges, converted from a 0/1/2 integer rating scale to a 0.0--1.0 float scale.

Rating interpretation: 0 = not relevant, 1 = partially relevant, 2 = fully relevant. Each judge has built-in retry logic (configurable via max_retries, default 5) to handle invalid ratings or LLM failures. If a judge fails after all retries, it returns NaN, and the averaging logic gracefully falls back to the other judge's score.

Edge cases handled: empty inputs, user input matching context exactly, and context being a substring of user input all return 0.0.

Usage

Instantiate with a required llm parameter and optional max_retries. Call ascore(user_input, retrieved_contexts). The retrieved contexts are concatenated with newline separators before evaluation.

Code Reference

Property Value
Source Location src/ragas/metrics/collections/context_relevance/metric.py L1--182
Signature class ContextRelevance(BaseMetric)
Import from ragas.metrics.collections import ContextRelevance

I/O Contract

Inputs

Parameter Type Required Description
user_input str Yes The original question
retrieved_contexts List[str] Yes Retrieved contexts to evaluate for relevance

Constructor Parameters

Parameter Type Default Description
llm InstructorBaseRagasLLM (required) Modern instructor-based LLM for dual-judge evaluation
name str "context_relevance" Metric name
max_retries int 5 Maximum retry attempts per judge for invalid ratings

Outputs

Field Type Description
MetricResult.value float Context relevance score in range 0.0--1.0 (higher is better)

Usage Examples

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import ContextRelevance

# Setup
client = AsyncOpenAI()
llm = llm_factory("openai", client=client, model="gpt-4o")

# Create metric
metric = ContextRelevance(llm=llm)

# Single evaluation
result = await metric.ascore(
    user_input="When was Einstein born?",
    retrieved_contexts=[
        "Albert Einstein was born March 14, 1879 in Ulm, Germany.",
        "Einstein developed the theory of relativity.",
    ]
)
print(f"Context Relevance: {result.value}")

# With custom retry count
robust_metric = ContextRelevance(llm=llm, max_retries=10)
result = await robust_metric.ascore(
    user_input="What causes rain?",
    retrieved_contexts=["Water evaporates and condenses in clouds."]
)
print(f"Relevance: {result.value}")

# Batch evaluation
results = await metric.abatch_score([
    {
        "user_input": "What is Python?",
        "retrieved_contexts": ["Python is a programming language."],
    },
    {
        "user_input": "What is Java?",
        "retrieved_contexts": ["Java is an island in Indonesia."],
    },
])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment