Implementation:Confident ai Deepeval FaithfulnessMetric

Overview

FaithfulnessMetric is an API class in the deepeval library that evaluates whether an LLM's output is faithful to the provided retrieval context. It uses claim-level decomposition and verification to determine what proportion of the response's assertions are supported by the context, producing a score between 0.0 and 1.0.

This is an API Doc implementation.

Source

Repository: Confident AI Deepeval
File: deepeval/metrics/faithfulness/faithfulness.py, lines 30-62
Class: FaithfulnessMetric

Import

from deepeval.metrics import FaithfulnessMetric

Constructor Signature

FaithfulnessMetric(
    threshold: float = 0.5,
    model: Optional[Union[str, DeepEvalBaseLLM]] = None,
    include_reason: bool = True,
    truths_extraction_limit: int = None,
    strict_mode: bool = False,
    async_mode: bool = True,
    verbose_mode: bool = None
)

Parameters

Parameter	Type	Required	Default	Description
`threshold`	`float`	No	`0.5`	The minimum score (0.0 to 1.0) for a test case to be considered passing.
`model`	`Optional[Union[str, DeepEvalBaseLLM]]`	No	`None`	The judge LLM to use for claim extraction and verification. Accepts a model name string (e.g., `"gpt-4o"`) or a custom `DeepEvalBaseLLM` instance. Defaults to the framework default model.
`include_reason`	`bool`	No	`True`	Whether to include a natural language explanation of the score in the evaluation result.
`truths_extraction_limit`	`int`	No	`None`	Limits the number of truths extracted from the retrieval context. Useful for controlling evaluation cost and latency when the context is large.
`strict_mode`	`bool`	No	`False`	When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail).
`async_mode`	`bool`	No	`True`	Whether to run the evaluation asynchronously.
`verbose_mode`	`bool`	No	`None`	Whether to print detailed evaluation logs. Inherits from global config if not specified.

Input / Output

Inputs: Configuration parameters as described above.
Outputs: A configured FaithfulnessMetric object that can be passed to evaluate() or assert_test(). When executed against a test case, it produces a score (0.0-1.0) representing the proportion of claims supported by the context, a pass/fail status, and optionally a reason string.

Required Test Case Fields

When this metric is applied to an LLMTestCase, the following fields are required:

input -- The user's query or prompt.
actual_output -- The LLM's response to evaluate.
retrieval_context -- The list of retrieved context passages against which faithfulness is verified.

Example

Basic Usage

from deepeval.metrics import FaithfulnessMetric

faithfulness_metric = FaithfulnessMetric(
    threshold=0.7,
    model="gpt-4o"
)

Usage with Test Case

from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase
from deepeval import evaluate

metric = FaithfulnessMetric(threshold=0.7, model="gpt-4o", include_reason=True)

test_case = LLMTestCase(
    input="What are the benefits of exercise?",
    actual_output="Regular exercise improves cardiovascular health, boosts mental well-being, and helps maintain a healthy weight.",
    retrieval_context=[
        "Exercise has been shown to improve cardiovascular health and reduce the risk of heart disease.",
        "Physical activity is associated with improved mental health and reduced symptoms of depression.",
        "Maintaining a regular exercise routine helps with weight management."
    ]
)

result = evaluate(test_cases=[test_case], metrics=[metric])

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment