Implementation:Confident ai Deepeval Assert Test Function
Appearance
Overview
assert_test is a function in the deepeval library that evaluates a single test case against one or more metrics and raises an AssertionError if any metric score falls below its configured threshold. This enables pytest-native LLM evaluation testing and CI/CD pipeline integration.
This is an API Doc implementation.
Source
- Repository: Confident AI Deepeval
- File:
deepeval/evaluate/evaluate.py, lines 71-182 - Function:
assert_test
Import
from deepeval import assert_test
Function Signature
def assert_test(
test_case: Optional[Union[LLMTestCase, ConversationalTestCase]] = None,
metrics: Optional[List[BaseMetric]] = None,
golden: Optional[Golden] = None,
observed_callback: Optional[Callable] = None,
run_async: bool = True
) -> None
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
test_case |
Optional[Union[LLMTestCase, ConversationalTestCase]] |
No | None |
The test case to evaluate. Contains the input, actual output, and any context fields required by the metrics. |
metrics |
Optional[List[BaseMetric]] |
No | None |
The list of evaluation metrics to apply. Each metric has its own threshold; the assertion fails if any metric's score falls below its threshold. |
golden |
Optional[Golden] |
No | None |
An optional Golden dataset entry that can provide expected outputs and context. Used as an alternative to specifying these fields directly in the test case. |
observed_callback |
Optional[Callable] |
No | None |
An optional callback function invoked with the evaluation results after metrics are computed. Useful for custom logging or side effects. |
run_async |
bool |
No | True |
Whether to run metric evaluations asynchronously for performance. |
Input / Output
- Inputs: A test case and a list of metrics to evaluate against.
- Outputs: Returns
Noneon success (all metrics pass their thresholds). RaisesAssertionErroron failure, with diagnostic information including the metric name, computed score, and threshold.
Behavior
- Each metric in the
metricslist is applied to the test case. - The metric computes a score (0.0 to 1.0) based on its evaluation logic.
- If the score is greater than or equal to the metric's threshold, the metric passes.
- If any metric's score falls below its threshold, an
AssertionErroris raised. - The error message includes details about which metric(s) failed and their scores.
Example
Basic Assertion Test
from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
test_case = LLMTestCase(
input="What is the capital of France?",
actual_output="Paris is the capital of France."
)
assert_test(
test_case=test_case,
metrics=[AnswerRelevancyMetric(threshold=0.7)]
)
Pytest Integration
import pytest
from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase
def test_llm_response_quality():
test_case = LLMTestCase(
input="What are the benefits of exercise?",
actual_output="Exercise improves cardiovascular health and mental well-being.",
retrieval_context=[
"Regular exercise improves cardiovascular health.",
"Physical activity is linked to better mental health."
]
)
assert_test(
test_case=test_case,
metrics=[
AnswerRelevancyMetric(threshold=0.7),
FaithfulnessMetric(threshold=0.8)
]
)
CI/CD Pipeline Usage
# Run evaluation tests with pytest
pytest test_llm_evaluation.py -v
When a metric fails, the output includes diagnostic information:
FAILED test_llm_evaluation.py::test_llm_response_quality
AssertionError: Metric 'FaithfulnessMetric' failed.
Score: 0.65
Threshold: 0.80
Reason: The claim about mental well-being is not fully supported by the retrieval context.
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment