| Field |
Value
|
| source |
Repo
|
| domains |
Metrics, NVIDIA
|
| last_updated |
2026-02-10
|
Overview
The NVMetrics module provides NVIDIA-optimized dual-judge evaluation metrics: AnswerAccuracy, ContextRelevance, and ResponseGroundedness, each using two distinct prompt templates and averaging their scores.
Description
This module contains three metric classes optimized for use with NVIDIA LLM models:
- AnswerAccuracy -- Measures answer accuracy compared to ground truth by running two complementary prompt templates (one rating the user answer against reference, the other in reverse order) and averaging the results. Scores are rated on a 0/2/4 scale, normalized to 0-1.
- ContextRelevance -- Scores the relevance of retrieved contexts to the user input using two prompt templates on a 0/1/2 scale, normalized to 0-1. Includes edge-case handling for empty or trivially matching inputs.
- ResponseGroundedness -- Scores how well the response is grounded in the retrieved contexts using two prompt templates on a 0/1/2 scale, normalized to 0-1. Includes edge-case handling for exact matches and empty inputs.
All three classes use a retry mechanism (default 5 retries) and use raw text generation via BaseRagasLLM.agenerate_text rather than the standard PydanticPrompt pipeline. Each inherits from MetricWithLLM and SingleTurnMetric.
Usage
Each metric has different required columns. An LLM (ideally from NVIDIA's model catalog) must be configured.
Code Reference
| Property |
Value
|
| Source Location |
src/ragas/metrics/_nv_metrics.py L18-432
|
| Class Signatures |
class AnswerAccuracy(MetricWithLLM, SingleTurnMetric), class ContextRelevance(MetricWithLLM, SingleTurnMetric), class ResponseGroundedness(MetricWithLLM, SingleTurnMetric)
|
| Import |
from ragas.metrics._nv_metrics import AnswerAccuracy, ContextRelevance, ResponseGroundedness
|
I/O Contract
Inputs (AnswerAccuracy)
| Parameter |
Type |
Required |
Description
|
| user_input |
str |
Yes |
The user query
|
| response |
str |
Yes |
The generated response
|
| reference |
str |
Yes |
The ground truth reference
|
Inputs (ContextRelevance)
| Parameter |
Type |
Required |
Description
|
| user_input |
str |
Yes |
The user query
|
| retrieved_contexts |
List[str] |
Yes |
The retrieved context passages
|
Inputs (ResponseGroundedness)
| Parameter |
Type |
Required |
Description
|
| response |
str |
Yes |
The generated response
|
| retrieved_contexts |
List[str] |
Yes |
The retrieved context passages
|
Outputs
| Output |
Type |
Description
|
| score |
float |
Average of two judge scores, normalized to 0.0-1.0, or NaN on error
|
Usage Examples
from ragas.metrics._nv_metrics import AnswerAccuracy, ContextRelevance, ResponseGroundedness
from ragas.dataset_schema import SingleTurnSample
accuracy = AnswerAccuracy()
# accuracy.llm = ... # Set your NVIDIA LLM
sample = SingleTurnSample(
user_input="What is the capital of France?",
response="Paris is the capital of France.",
reference="The capital of France is Paris."
)
# score = await accuracy.single_turn_ascore(sample)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.