Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval Assert Test Function

From Leeroopedia

Overview

assert_test is a function in the deepeval library that evaluates a single test case against one or more metrics and raises an AssertionError if any metric score falls below its configured threshold. This enables pytest-native LLM evaluation testing and CI/CD pipeline integration.

This is an API Doc implementation.

Source

  • Repository: Confident AI Deepeval
  • File: deepeval/evaluate/evaluate.py, lines 71-182
  • Function: assert_test

Import

from deepeval import assert_test

Function Signature

def assert_test(
    test_case: Optional[Union[LLMTestCase, ConversationalTestCase]] = None,
    metrics: Optional[List[BaseMetric]] = None,
    golden: Optional[Golden] = None,
    observed_callback: Optional[Callable] = None,
    run_async: bool = True
) -> None

Parameters

Parameter Type Required Default Description
test_case Optional[Union[LLMTestCase, ConversationalTestCase]] No None The test case to evaluate. Contains the input, actual output, and any context fields required by the metrics.
metrics Optional[List[BaseMetric]] No None The list of evaluation metrics to apply. Each metric has its own threshold; the assertion fails if any metric's score falls below its threshold.
golden Optional[Golden] No None An optional Golden dataset entry that can provide expected outputs and context. Used as an alternative to specifying these fields directly in the test case.
observed_callback Optional[Callable] No None An optional callback function invoked with the evaluation results after metrics are computed. Useful for custom logging or side effects.
run_async bool No True Whether to run metric evaluations asynchronously for performance.

Input / Output

  • Inputs: A test case and a list of metrics to evaluate against.
  • Outputs: Returns None on success (all metrics pass their thresholds). Raises AssertionError on failure, with diagnostic information including the metric name, computed score, and threshold.

Behavior

  1. Each metric in the metrics list is applied to the test case.
  2. The metric computes a score (0.0 to 1.0) based on its evaluation logic.
  3. If the score is greater than or equal to the metric's threshold, the metric passes.
  4. If any metric's score falls below its threshold, an AssertionError is raised.
  5. The error message includes details about which metric(s) failed and their scores.

Example

Basic Assertion Test

from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase

test_case = LLMTestCase(
    input="What is the capital of France?",
    actual_output="Paris is the capital of France."
)

assert_test(
    test_case=test_case,
    metrics=[AnswerRelevancyMetric(threshold=0.7)]
)

Pytest Integration

import pytest
from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase


def test_llm_response_quality():
    test_case = LLMTestCase(
        input="What are the benefits of exercise?",
        actual_output="Exercise improves cardiovascular health and mental well-being.",
        retrieval_context=[
            "Regular exercise improves cardiovascular health.",
            "Physical activity is linked to better mental health."
        ]
    )

    assert_test(
        test_case=test_case,
        metrics=[
            AnswerRelevancyMetric(threshold=0.7),
            FaithfulnessMetric(threshold=0.8)
        ]
    )

CI/CD Pipeline Usage

# Run evaluation tests with pytest
pytest test_llm_evaluation.py -v

When a metric fails, the output includes diagnostic information:

FAILED test_llm_evaluation.py::test_llm_response_quality
AssertionError: Metric 'FaithfulnessMetric' failed.
  Score: 0.65
  Threshold: 0.80
  Reason: The claim about mental well-being is not fully supported by the retrieval context.

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment