Implementation:Confident ai Deepeval Assert Test Function

Overview

assert_test is a function in the deepeval library that evaluates a single test case against one or more metrics and raises an AssertionError if any metric score falls below its configured threshold. This enables pytest-native LLM evaluation testing and CI/CD pipeline integration.

This is an API Doc implementation.

Source

Repository: Confident AI Deepeval
File: deepeval/evaluate/evaluate.py, lines 71-182
Function: assert_test

Import

from deepeval import assert_test

Function Signature

def assert_test(
    test_case: Optional[Union[LLMTestCase, ConversationalTestCase]] = None,
    metrics: Optional[List[BaseMetric]] = None,
    golden: Optional[Golden] = None,
    observed_callback: Optional[Callable] = None,
    run_async: bool = True
) -> None

Parameters

Parameter	Type	Required	Default	Description
`test_case`	`Optional[Union[LLMTestCase, ConversationalTestCase]]`	No	`None`	The test case to evaluate. Contains the input, actual output, and any context fields required by the metrics.
`metrics`	`Optional[List[BaseMetric]]`	No	`None`	The list of evaluation metrics to apply. Each metric has its own threshold; the assertion fails if any metric's score falls below its threshold.
`golden`	`Optional[Golden]`	No	`None`	An optional Golden dataset entry that can provide expected outputs and context. Used as an alternative to specifying these fields directly in the test case.
`observed_callback`	`Optional[Callable]`	No	`None`	An optional callback function invoked with the evaluation results after metrics are computed. Useful for custom logging or side effects.
`run_async`	`bool`	No	`True`	Whether to run metric evaluations asynchronously for performance.

Input / Output

Inputs: A test case and a list of metrics to evaluate against.
Outputs: Returns None on success (all metrics pass their thresholds). Raises AssertionError on failure, with diagnostic information including the metric name, computed score, and threshold.

Behavior

Each metric in the metrics list is applied to the test case.
The metric computes a score (0.0 to 1.0) based on its evaluation logic.
If the score is greater than or equal to the metric's threshold, the metric passes.
If any metric's score falls below its threshold, an AssertionError is raised.
The error message includes details about which metric(s) failed and their scores.

Example

Basic Assertion Test

from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase

test_case = LLMTestCase(
    input="What is the capital of France?",
    actual_output="Paris is the capital of France."
)

assert_test(
    test_case=test_case,
    metrics=[AnswerRelevancyMetric(threshold=0.7)]
)

Pytest Integration

import pytest
from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase


def test_llm_response_quality():
    test_case = LLMTestCase(
        input="What are the benefits of exercise?",
        actual_output="Exercise improves cardiovascular health and mental well-being.",
        retrieval_context=[
            "Regular exercise improves cardiovascular health.",
            "Physical activity is linked to better mental health."
        ]
    )

    assert_test(
        test_case=test_case,
        metrics=[
            AnswerRelevancyMetric(threshold=0.7),
            FaithfulnessMetric(threshold=0.8)
        ]
    )

CI/CD Pipeline Usage

# Run evaluation tests with pytest
pytest test_llm_evaluation.py -v

When a metric fails, the output includes diagnostic information:

FAILED test_llm_evaluation.py::test_llm_response_quality
AssertionError: Metric 'FaithfulnessMetric' failed.
  Score: 0.65
  Threshold: 0.80
  Reason: The claim about mental well-being is not fully supported by the retrieval context.

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment