Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval AnswerRelevancyMetric

From Leeroopedia

Overview

AnswerRelevancyMetric is an API class in the deepeval library that evaluates whether an LLM's output is relevant to the user's input query. It uses an LLM judge to assess the semantic alignment between the question asked and the answer provided, producing a score between 0.0 and 1.0 with an optional natural language reason.

This is an API Doc implementation.

Source

  • Repository: Confident AI Deepeval
  • File: deepeval/metrics/answer_relevancy/answer_relevancy.py, lines 28-54
  • Class: AnswerRelevancyMetric

Import

from deepeval.metrics import AnswerRelevancyMetric

Constructor Signature

AnswerRelevancyMetric(
    threshold: float = 0.5,
    model: Optional[Union[str, DeepEvalBaseLLM]] = None,
    include_reason: bool = True,
    strict_mode: bool = False,
    async_mode: bool = True,
    verbose_mode: bool = None
)

Parameters

Parameter Type Required Default Description
threshold float No 0.5 The minimum score (0.0 to 1.0) for a test case to be considered passing.
model Optional[Union[str, DeepEvalBaseLLM]] No None The judge LLM to use for evaluation. Accepts a model name string (e.g., "gpt-4o") or a custom DeepEvalBaseLLM instance. Defaults to the framework default model.
include_reason bool No True Whether to include a natural language explanation of the score in the evaluation result.
strict_mode bool No False When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail).
async_mode bool No True Whether to run the evaluation asynchronously.
verbose_mode bool No None Whether to print detailed evaluation logs. Inherits from global config if not specified.

Input / Output

  • Inputs: Configuration parameters as described above.
  • Outputs: A configured AnswerRelevancyMetric object that can be passed to evaluate() or assert_test(). When executed against a test case, it produces a score (0.0-1.0), a pass/fail status, and optionally a reason string.

Required Test Case Fields

When this metric is applied to an LLMTestCase, the following fields are required:

  • input -- The user's query or prompt.
  • actual_output -- The LLM's response to evaluate.

Example

Basic Usage

from deepeval.metrics import AnswerRelevancyMetric

relevancy_metric = AnswerRelevancyMetric(
    threshold=0.7,
    model="gpt-4o",
    include_reason=True
)

Usage with Evaluation

from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
from deepeval import evaluate

metric = AnswerRelevancyMetric(threshold=0.7, model="gpt-4o")

test_case = LLMTestCase(
    input="What is the capital of France?",
    actual_output="Paris is the capital city of France, located in the north-central part of the country."
)

result = evaluate(test_cases=[test_case], metrics=[metric])

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment