Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval FaithfulnessMetric

From Leeroopedia

Overview

FaithfulnessMetric is an API class in the deepeval library that evaluates whether an LLM's output is faithful to the provided retrieval context. It uses claim-level decomposition and verification to determine what proportion of the response's assertions are supported by the context, producing a score between 0.0 and 1.0.

This is an API Doc implementation.

Source

  • Repository: Confident AI Deepeval
  • File: deepeval/metrics/faithfulness/faithfulness.py, lines 30-62
  • Class: FaithfulnessMetric

Import

from deepeval.metrics import FaithfulnessMetric

Constructor Signature

FaithfulnessMetric(
    threshold: float = 0.5,
    model: Optional[Union[str, DeepEvalBaseLLM]] = None,
    include_reason: bool = True,
    truths_extraction_limit: int = None,
    strict_mode: bool = False,
    async_mode: bool = True,
    verbose_mode: bool = None
)

Parameters

Parameter Type Required Default Description
threshold float No 0.5 The minimum score (0.0 to 1.0) for a test case to be considered passing.
model Optional[Union[str, DeepEvalBaseLLM]] No None The judge LLM to use for claim extraction and verification. Accepts a model name string (e.g., "gpt-4o") or a custom DeepEvalBaseLLM instance. Defaults to the framework default model.
include_reason bool No True Whether to include a natural language explanation of the score in the evaluation result.
truths_extraction_limit int No None Limits the number of truths extracted from the retrieval context. Useful for controlling evaluation cost and latency when the context is large.
strict_mode bool No False When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail).
async_mode bool No True Whether to run the evaluation asynchronously.
verbose_mode bool No None Whether to print detailed evaluation logs. Inherits from global config if not specified.

Input / Output

  • Inputs: Configuration parameters as described above.
  • Outputs: A configured FaithfulnessMetric object that can be passed to evaluate() or assert_test(). When executed against a test case, it produces a score (0.0-1.0) representing the proportion of claims supported by the context, a pass/fail status, and optionally a reason string.

Required Test Case Fields

When this metric is applied to an LLMTestCase, the following fields are required:

  • input -- The user's query or prompt.
  • actual_output -- The LLM's response to evaluate.
  • retrieval_context -- The list of retrieved context passages against which faithfulness is verified.

Example

Basic Usage

from deepeval.metrics import FaithfulnessMetric

faithfulness_metric = FaithfulnessMetric(
    threshold=0.7,
    model="gpt-4o"
)

Usage with Test Case

from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase
from deepeval import evaluate

metric = FaithfulnessMetric(threshold=0.7, model="gpt-4o", include_reason=True)

test_case = LLMTestCase(
    input="What are the benefits of exercise?",
    actual_output="Regular exercise improves cardiovascular health, boosts mental well-being, and helps maintain a healthy weight.",
    retrieval_context=[
        "Exercise has been shown to improve cardiovascular health and reduce the risk of heart disease.",
        "Physical activity is associated with improved mental health and reduced symptoms of depression.",
        "Maintaining a regular exercise routine helps with weight management."
    ]
)

result = evaluate(test_cases=[test_case], metrics=[metric])

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment