Implementation:Explodinggradients Ragas NVMetrics Module

Field	Value
source	Repo
domains	Metrics, NVIDIA
last_updated	2026-02-10

Overview

The NVMetrics module provides NVIDIA-optimized dual-judge evaluation metrics: AnswerAccuracy, ContextRelevance, and ResponseGroundedness, each using two distinct prompt templates and averaging their scores.

Description

This module contains three metric classes optimized for use with NVIDIA LLM models:

AnswerAccuracy -- Measures answer accuracy compared to ground truth by running two complementary prompt templates (one rating the user answer against reference, the other in reverse order) and averaging the results. Scores are rated on a 0/2/4 scale, normalized to 0-1.
ContextRelevance -- Scores the relevance of retrieved contexts to the user input using two prompt templates on a 0/1/2 scale, normalized to 0-1. Includes edge-case handling for empty or trivially matching inputs.
ResponseGroundedness -- Scores how well the response is grounded in the retrieved contexts using two prompt templates on a 0/1/2 scale, normalized to 0-1. Includes edge-case handling for exact matches and empty inputs.

All three classes use a retry mechanism (default 5 retries) and use raw text generation via BaseRagasLLM.agenerate_text rather than the standard PydanticPrompt pipeline. Each inherits from MetricWithLLM and SingleTurnMetric.

Usage

Each metric has different required columns. An LLM (ideally from NVIDIA's model catalog) must be configured.

Code Reference

Property	Value
Source Location	`src/ragas/metrics/_nv_metrics.py` L18-432
Class Signatures	`class AnswerAccuracy(MetricWithLLM, SingleTurnMetric)`, `class ContextRelevance(MetricWithLLM, SingleTurnMetric)`, `class ResponseGroundedness(MetricWithLLM, SingleTurnMetric)`
Import	`from ragas.metrics._nv_metrics import AnswerAccuracy, ContextRelevance, ResponseGroundedness`

I/O Contract

Inputs (AnswerAccuracy)

Parameter	Type	Required	Description
user_input	str	Yes	The user query
response	str	Yes	The generated response
reference	str	Yes	The ground truth reference

Inputs (ContextRelevance)

Parameter	Type	Required	Description
user_input	str	Yes	The user query
retrieved_contexts	List[str]	Yes	The retrieved context passages

Inputs (ResponseGroundedness)

Parameter	Type	Required	Description
response	str	Yes	The generated response
retrieved_contexts	List[str]	Yes	The retrieved context passages

Outputs

Output	Type	Description
score	float	Average of two judge scores, normalized to 0.0-1.0, or NaN on error

Usage Examples

from ragas.metrics._nv_metrics import AnswerAccuracy, ContextRelevance, ResponseGroundedness
from ragas.dataset_schema import SingleTurnSample

accuracy = AnswerAccuracy()
# accuracy.llm = ...  # Set your NVIDIA LLM

sample = SingleTurnSample(
    user_input="What is the capital of France?",
    response="Paris is the capital of France.",
    reference="The capital of France is Paris."
)
# score = await accuracy.single_turn_ascore(sample)

Related Pages

Explodinggradients_Ragas_AnswerCorrectness_Metric -- Alternative answer correctness metric
Explodinggradients_Ragas_ContextPrecision_Metric -- Alternative context precision metric
Explodinggradients_Ragas_Faithfulness_Metric -- Alternative groundedness/faithfulness metric

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment