Implementation:Confident ai Deepeval AnswerRelevancyMetric
Appearance
Overview
AnswerRelevancyMetric is an API class in the deepeval library that evaluates whether an LLM's output is relevant to the user's input query. It uses an LLM judge to assess the semantic alignment between the question asked and the answer provided, producing a score between 0.0 and 1.0 with an optional natural language reason.
This is an API Doc implementation.
Source
- Repository: Confident AI Deepeval
- File:
deepeval/metrics/answer_relevancy/answer_relevancy.py, lines 28-54 - Class:
AnswerRelevancyMetric
Import
from deepeval.metrics import AnswerRelevancyMetric
Constructor Signature
AnswerRelevancyMetric(
threshold: float = 0.5,
model: Optional[Union[str, DeepEvalBaseLLM]] = None,
include_reason: bool = True,
strict_mode: bool = False,
async_mode: bool = True,
verbose_mode: bool = None
)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
threshold |
float |
No | 0.5 |
The minimum score (0.0 to 1.0) for a test case to be considered passing. |
model |
Optional[Union[str, DeepEvalBaseLLM]] |
No | None |
The judge LLM to use for evaluation. Accepts a model name string (e.g., "gpt-4o") or a custom DeepEvalBaseLLM instance. Defaults to the framework default model.
|
include_reason |
bool |
No | True |
Whether to include a natural language explanation of the score in the evaluation result. |
strict_mode |
bool |
No | False |
When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail). |
async_mode |
bool |
No | True |
Whether to run the evaluation asynchronously. |
verbose_mode |
bool |
No | None |
Whether to print detailed evaluation logs. Inherits from global config if not specified. |
Input / Output
- Inputs: Configuration parameters as described above.
- Outputs: A configured
AnswerRelevancyMetricobject that can be passed toevaluate()orassert_test(). When executed against a test case, it produces a score (0.0-1.0), a pass/fail status, and optionally a reason string.
Required Test Case Fields
When this metric is applied to an LLMTestCase, the following fields are required:
input-- The user's query or prompt.actual_output-- The LLM's response to evaluate.
Example
Basic Usage
from deepeval.metrics import AnswerRelevancyMetric
relevancy_metric = AnswerRelevancyMetric(
threshold=0.7,
model="gpt-4o",
include_reason=True
)
Usage with Evaluation
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
from deepeval import evaluate
metric = AnswerRelevancyMetric(threshold=0.7, model="gpt-4o")
test_case = LLMTestCase(
input="What is the capital of France?",
actual_output="Paris is the capital city of France, located in the north-central part of the country."
)
result = evaluate(test_cases=[test_case], metrics=[metric])
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment