Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas NVMetrics Module

From Leeroopedia


Field Value
source Repo
domains Metrics, NVIDIA
last_updated 2026-02-10

Overview

The NVMetrics module provides NVIDIA-optimized dual-judge evaluation metrics: AnswerAccuracy, ContextRelevance, and ResponseGroundedness, each using two distinct prompt templates and averaging their scores.

Description

This module contains three metric classes optimized for use with NVIDIA LLM models:

  • AnswerAccuracy -- Measures answer accuracy compared to ground truth by running two complementary prompt templates (one rating the user answer against reference, the other in reverse order) and averaging the results. Scores are rated on a 0/2/4 scale, normalized to 0-1.
  • ContextRelevance -- Scores the relevance of retrieved contexts to the user input using two prompt templates on a 0/1/2 scale, normalized to 0-1. Includes edge-case handling for empty or trivially matching inputs.
  • ResponseGroundedness -- Scores how well the response is grounded in the retrieved contexts using two prompt templates on a 0/1/2 scale, normalized to 0-1. Includes edge-case handling for exact matches and empty inputs.

All three classes use a retry mechanism (default 5 retries) and use raw text generation via BaseRagasLLM.agenerate_text rather than the standard PydanticPrompt pipeline. Each inherits from MetricWithLLM and SingleTurnMetric.

Usage

Each metric has different required columns. An LLM (ideally from NVIDIA's model catalog) must be configured.

Code Reference

Property Value
Source Location src/ragas/metrics/_nv_metrics.py L18-432
Class Signatures class AnswerAccuracy(MetricWithLLM, SingleTurnMetric), class ContextRelevance(MetricWithLLM, SingleTurnMetric), class ResponseGroundedness(MetricWithLLM, SingleTurnMetric)
Import from ragas.metrics._nv_metrics import AnswerAccuracy, ContextRelevance, ResponseGroundedness

I/O Contract

Inputs (AnswerAccuracy)

Parameter Type Required Description
user_input str Yes The user query
response str Yes The generated response
reference str Yes The ground truth reference

Inputs (ContextRelevance)

Parameter Type Required Description
user_input str Yes The user query
retrieved_contexts List[str] Yes The retrieved context passages

Inputs (ResponseGroundedness)

Parameter Type Required Description
response str Yes The generated response
retrieved_contexts List[str] Yes The retrieved context passages

Outputs

Output Type Description
score float Average of two judge scores, normalized to 0.0-1.0, or NaN on error

Usage Examples

from ragas.metrics._nv_metrics import AnswerAccuracy, ContextRelevance, ResponseGroundedness
from ragas.dataset_schema import SingleTurnSample

accuracy = AnswerAccuracy()
# accuracy.llm = ...  # Set your NVIDIA LLM

sample = SingleTurnSample(
    user_input="What is the capital of France?",
    response="Paris is the capital of France.",
    reference="The capital of France is Paris."
)
# score = await accuracy.single_turn_ascore(sample)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment