Implementation:Vibrantlabsai Ragas ContextRelevanceV2

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

Evaluates whether retrieved contexts are pertinent to the user's question using a dual-judge LLM evaluation system that averages two independent assessments for robust scoring.

Description

The ContextRelevance metric (V2 collections implementation) measures how relevant retrieved context documents are to the user's input question. It uses a dual-judge evaluation system inspired by NVIDIA's approach:

1. Judge 1 evaluates context relevance using a direct relevance prompt (ContextRelevanceJudge1Prompt). 2. Judge 2 evaluates from an alternative perspective for fairness (ContextRelevanceJudge2Prompt). 3. The final score is the average of both judges' ratings.

Each judge assigns a rating on a 0-1-2 scale:

0 - Not relevant
1 - Partially relevant
2 - Fully relevant

The raw ratings are converted to the 0.0-1.0 scale by dividing by 2, then averaged. If one judge fails to produce a valid rating after retries, the other judge's score is used. If both fail, the result is NaN.

The metric includes retry logic (configurable via max_retries, default 5) to handle cases where the LLM returns invalid ratings. It also handles several edge cases: empty inputs return 0.0, and cases where the context exactly matches or is contained within the user input also return 0.0.

The retrieved contexts list is joined with newline characters into a single string before evaluation. Structured prompts use ContextRelevanceInput and ContextRelevanceOutput data classes for communication with the LLM.

Usage

Use this metric to evaluate the quality of a retrieval system in a RAG pipeline. A high score indicates that the retrieved documents are relevant to the user's question, while a low score suggests the retrieval is returning irrelevant content.

This is the V2 collections version which uses modern instructor-based LLMs with structured output and a dual-judge system for more robust evaluation compared to the V1 single-judge approach.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/collections/context_relevance/metric.py

Signature

class ContextRelevance(BaseMetric):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        name: str = "context_relevance",
        max_retries: int = 5,
        **kwargs,
    ): ...

    async def ascore(
        self, user_input: str, retrieved_contexts: List[str]
    ) -> MetricResult: ...

Import

from ragas.metrics.collections import ContextRelevance

I/O Contract

Constructor Parameters

Name	Type	Required	Description
llm	InstructorBaseRagasLLM	Yes	Modern instructor-based LLM used for dual-judge evaluation
name	str	No	Metric name (default: "context_relevance")
max_retries	int	No	Maximum retry attempts when the LLM returns an invalid rating (default: 5)

Inputs

Name	Type	Required	Description
user_input	str	Yes	The original question posed by the user. Must be non-empty
retrieved_contexts	List[str]	Yes	List of retrieved context strings to evaluate for relevance. Must be non-empty

Outputs

Name	Type	Description
score	MetricResult (float value)	Context relevance score between 0.0 and 1.0. Higher is better. May be NaN if both judges fail to produce valid ratings

Usage Examples

Basic Usage

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import ContextRelevance

# Setup dependencies
client = AsyncOpenAI()
llm = llm_factory("openai", client=client, model="gpt-4o")

# Create metric instance
metric = ContextRelevance(llm=llm)

# Single evaluation
result = await metric.ascore(
    user_input="When was Einstein born?",
    retrieved_contexts=["Albert Einstein was born on March 14, 1879 in Ulm, Germany."]
)
print(f"Context Relevance: {result.value}")

Multiple Contexts

from ragas.metrics.collections import ContextRelevance

metric = ContextRelevance(llm=llm)

result = await metric.ascore(
    user_input="What are the health benefits of green tea?",
    retrieved_contexts=[
        "Green tea contains antioxidants called catechins that may reduce inflammation.",
        "The history of tea drinking dates back to ancient China.",
        "Studies suggest green tea may lower the risk of heart disease.",
    ]
)
print(f"Context Relevance: {result.value}")

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment