Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas ContextRelevanceV2

From Leeroopedia
Revision as of 11:55, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Vibrantlabsai_Ragas_ContextRelevanceV2.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

Evaluates whether retrieved contexts are pertinent to the user's question using a dual-judge LLM evaluation system that averages two independent assessments for robust scoring.

Description

The ContextRelevance metric (V2 collections implementation) measures how relevant retrieved context documents are to the user's input question. It uses a dual-judge evaluation system inspired by NVIDIA's approach:

1. Judge 1 evaluates context relevance using a direct relevance prompt (ContextRelevanceJudge1Prompt). 2. Judge 2 evaluates from an alternative perspective for fairness (ContextRelevanceJudge2Prompt). 3. The final score is the average of both judges' ratings.

Each judge assigns a rating on a 0-1-2 scale:

  • 0 - Not relevant
  • 1 - Partially relevant
  • 2 - Fully relevant

The raw ratings are converted to the 0.0-1.0 scale by dividing by 2, then averaged. If one judge fails to produce a valid rating after retries, the other judge's score is used. If both fail, the result is NaN.

The metric includes retry logic (configurable via max_retries, default 5) to handle cases where the LLM returns invalid ratings. It also handles several edge cases: empty inputs return 0.0, and cases where the context exactly matches or is contained within the user input also return 0.0.

The retrieved contexts list is joined with newline characters into a single string before evaluation. Structured prompts use ContextRelevanceInput and ContextRelevanceOutput data classes for communication with the LLM.

Usage

Use this metric to evaluate the quality of a retrieval system in a RAG pipeline. A high score indicates that the retrieved documents are relevant to the user's question, while a low score suggests the retrieval is returning irrelevant content.

This is the V2 collections version which uses modern instructor-based LLMs with structured output and a dual-judge system for more robust evaluation compared to the V1 single-judge approach.

Code Reference

Source Location

  • Repository: Vibrantlabsai_Ragas
  • File: src/ragas/metrics/collections/context_relevance/metric.py

Signature

class ContextRelevance(BaseMetric):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        name: str = "context_relevance",
        max_retries: int = 5,
        **kwargs,
    ): ...

    async def ascore(
        self, user_input: str, retrieved_contexts: List[str]
    ) -> MetricResult: ...

Import

from ragas.metrics.collections import ContextRelevance

I/O Contract

Constructor Parameters

Name Type Required Description
llm InstructorBaseRagasLLM Yes Modern instructor-based LLM used for dual-judge evaluation
name str No Metric name (default: "context_relevance")
max_retries int No Maximum retry attempts when the LLM returns an invalid rating (default: 5)

Inputs

Name Type Required Description
user_input str Yes The original question posed by the user. Must be non-empty
retrieved_contexts List[str] Yes List of retrieved context strings to evaluate for relevance. Must be non-empty

Outputs

Name Type Description
score MetricResult (float value) Context relevance score between 0.0 and 1.0. Higher is better. May be NaN if both judges fail to produce valid ratings

Usage Examples

Basic Usage

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import ContextRelevance

# Setup dependencies
client = AsyncOpenAI()
llm = llm_factory("openai", client=client, model="gpt-4o")

# Create metric instance
metric = ContextRelevance(llm=llm)

# Single evaluation
result = await metric.ascore(
    user_input="When was Einstein born?",
    retrieved_contexts=["Albert Einstein was born on March 14, 1879 in Ulm, Germany."]
)
print(f"Context Relevance: {result.value}")

Multiple Contexts

from ragas.metrics.collections import ContextRelevance

metric = ContextRelevance(llm=llm)

result = await metric.ascore(
    user_input="What are the health benefits of green tea?",
    retrieved_contexts=[
        "Green tea contains antioxidants called catechins that may reduce inflammation.",
        "The history of tea drinking dates back to ancient China.",
        "Studies suggest green tea may lower the risk of heart disease.",
    ]
)
print(f"Context Relevance: {result.value}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment