Implementation:Confident ai Deepeval AnswerRelevancyMetric

Overview

AnswerRelevancyMetric is an API class in the deepeval library that evaluates whether an LLM's output is relevant to the user's input query. It uses an LLM judge to assess the semantic alignment between the question asked and the answer provided, producing a score between 0.0 and 1.0 with an optional natural language reason.

This is an API Doc implementation.

Source

Repository: Confident AI Deepeval
File: deepeval/metrics/answer_relevancy/answer_relevancy.py, lines 28-54
Class: AnswerRelevancyMetric

Import

from deepeval.metrics import AnswerRelevancyMetric

Constructor Signature

AnswerRelevancyMetric(
    threshold: float = 0.5,
    model: Optional[Union[str, DeepEvalBaseLLM]] = None,
    include_reason: bool = True,
    strict_mode: bool = False,
    async_mode: bool = True,
    verbose_mode: bool = None
)

Parameters

Parameter	Type	Required	Default	Description
`threshold`	`float`	No	`0.5`	The minimum score (0.0 to 1.0) for a test case to be considered passing.
`model`	`Optional[Union[str, DeepEvalBaseLLM]]`	No	`None`	The judge LLM to use for evaluation. Accepts a model name string (e.g., `"gpt-4o"`) or a custom `DeepEvalBaseLLM` instance. Defaults to the framework default model.
`include_reason`	`bool`	No	`True`	Whether to include a natural language explanation of the score in the evaluation result.
`strict_mode`	`bool`	No	`False`	When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail).
`async_mode`	`bool`	No	`True`	Whether to run the evaluation asynchronously.
`verbose_mode`	`bool`	No	`None`	Whether to print detailed evaluation logs. Inherits from global config if not specified.

Input / Output

Inputs: Configuration parameters as described above.
Outputs: A configured AnswerRelevancyMetric object that can be passed to evaluate() or assert_test(). When executed against a test case, it produces a score (0.0-1.0), a pass/fail status, and optionally a reason string.

Required Test Case Fields

When this metric is applied to an LLMTestCase, the following fields are required:

input -- The user's query or prompt.
actual_output -- The LLM's response to evaluate.

Example

Basic Usage

from deepeval.metrics import AnswerRelevancyMetric

relevancy_metric = AnswerRelevancyMetric(
    threshold=0.7,
    model="gpt-4o",
    include_reason=True
)

Usage with Evaluation

from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
from deepeval import evaluate

metric = AnswerRelevancyMetric(threshold=0.7, model="gpt-4o")

test_case = LLMTestCase(
    input="What is the capital of France?",
    actual_output="Paris is the capital city of France, located in the north-central part of the country."
)

result = evaluate(test_cases=[test_case], metrics=[metric])

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment