Implementation:Confident ai Deepeval FaithfulnessMetric
Appearance
Overview
FaithfulnessMetric is an API class in the deepeval library that evaluates whether an LLM's output is faithful to the provided retrieval context. It uses claim-level decomposition and verification to determine what proportion of the response's assertions are supported by the context, producing a score between 0.0 and 1.0.
This is an API Doc implementation.
Source
- Repository: Confident AI Deepeval
- File:
deepeval/metrics/faithfulness/faithfulness.py, lines 30-62 - Class:
FaithfulnessMetric
Import
from deepeval.metrics import FaithfulnessMetric
Constructor Signature
FaithfulnessMetric(
threshold: float = 0.5,
model: Optional[Union[str, DeepEvalBaseLLM]] = None,
include_reason: bool = True,
truths_extraction_limit: int = None,
strict_mode: bool = False,
async_mode: bool = True,
verbose_mode: bool = None
)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
threshold |
float |
No | 0.5 |
The minimum score (0.0 to 1.0) for a test case to be considered passing. |
model |
Optional[Union[str, DeepEvalBaseLLM]] |
No | None |
The judge LLM to use for claim extraction and verification. Accepts a model name string (e.g., "gpt-4o") or a custom DeepEvalBaseLLM instance. Defaults to the framework default model.
|
include_reason |
bool |
No | True |
Whether to include a natural language explanation of the score in the evaluation result. |
truths_extraction_limit |
int |
No | None |
Limits the number of truths extracted from the retrieval context. Useful for controlling evaluation cost and latency when the context is large. |
strict_mode |
bool |
No | False |
When enabled, the metric score is set to 0 if the raw score falls below the threshold (binary pass/fail). |
async_mode |
bool |
No | True |
Whether to run the evaluation asynchronously. |
verbose_mode |
bool |
No | None |
Whether to print detailed evaluation logs. Inherits from global config if not specified. |
Input / Output
- Inputs: Configuration parameters as described above.
- Outputs: A configured
FaithfulnessMetricobject that can be passed toevaluate()orassert_test(). When executed against a test case, it produces a score (0.0-1.0) representing the proportion of claims supported by the context, a pass/fail status, and optionally a reason string.
Required Test Case Fields
When this metric is applied to an LLMTestCase, the following fields are required:
input-- The user's query or prompt.actual_output-- The LLM's response to evaluate.retrieval_context-- The list of retrieved context passages against which faithfulness is verified.
Example
Basic Usage
from deepeval.metrics import FaithfulnessMetric
faithfulness_metric = FaithfulnessMetric(
threshold=0.7,
model="gpt-4o"
)
Usage with Test Case
from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase
from deepeval import evaluate
metric = FaithfulnessMetric(threshold=0.7, model="gpt-4o", include_reason=True)
test_case = LLMTestCase(
input="What are the benefits of exercise?",
actual_output="Regular exercise improves cardiovascular health, boosts mental well-being, and helps maintain a healthy weight.",
retrieval_context=[
"Exercise has been shown to improve cardiovascular health and reduce the risk of heart disease.",
"Physical activity is associated with improved mental health and reduced symptoms of depression.",
"Maintaining a regular exercise routine helps with weight management."
]
)
result = evaluate(test_cases=[test_case], metrics=[metric])
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment