Implementation:Confident ai Deepeval EvaluationResult

Overview

EvaluationResult is a data class in the deepeval library that encapsulates the outcomes of a batch evaluation run. It provides structured access to per-case test results and an optional link to the Confident AI cloud dashboard for interactive visualization and persistent storage.

This is an API Doc implementation.

Source

Repository: Confident AI Deepeval
File: deepeval/evaluate/types.py, lines 1-50
Class: EvaluationResult

Import

from deepeval.evaluate.types import EvaluationResult

Class Attributes

Attribute	Type	Description
`test_results`	`List[TestResult]`	A list of `TestResult` objects, one per test case evaluated. Each `TestResult` contains the test case data, per-metric scores, pass/fail statuses, and optional reasoning.
`confident_link`	`Optional[str]`	A URL to the evaluation results on the Confident AI cloud dashboard. Present only when the user is authenticated with a valid `CONFIDENT_API_KEY`.

Input / Output

Inputs: EvaluationResult is returned by the evaluate() function. It is not typically constructed directly by the user.
Outputs: Provides access to:
- test_results -- Iterable list of per-case results for programmatic analysis.
- confident_link -- URL for viewing results in the cloud dashboard.

TestResult Structure

Each TestResult object within test_results contains:

metrics_data -- A list of metric evaluation results, each with the metric name, score, threshold, success status, and optional reason.
input -- The original test case input.
actual_output -- The LLM's response.
expected_output -- The expected reference output (if provided).
context -- The ground-truth context (if provided).
retrieval_context -- The retrieved context (if provided).

Example

Accessing Evaluation Results

from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase

# Define metrics and test cases
relevancy = AnswerRelevancyMetric(threshold=0.7)
faithfulness = FaithfulnessMetric(threshold=0.7)

test_cases = [
    LLMTestCase(
        input="What is deep learning?",
        actual_output="Deep learning is a subset of machine learning using neural networks.",
        retrieval_context=["Deep learning uses multi-layered neural networks to learn from data."]
    )
]

# Run evaluation
result = evaluate(test_cases=test_cases, metrics=[relevancy, faithfulness])

# Access per-case results
for test_result in result.test_results:
    print(f"Input: {test_result.input}")
    for metric_data in test_result.metrics_data:
        print(f"  Metric: {metric_data.name}")
        print(f"  Score: {metric_data.score}")
        print(f"  Passed: {metric_data.success}")
        print(f"  Reason: {metric_data.reason}")

# Access cloud dashboard link
if result.confident_link:
    print(f"View results at: {result.confident_link}")

Programmatic Analysis

# Compute aggregate pass rate
total_metrics = 0
passed_metrics = 0

for test_result in result.test_results:
    for metric_data in test_result.metrics_data:
        total_metrics += 1
        if metric_data.success:
            passed_metrics += 1

pass_rate = passed_metrics / total_metrics if total_metrics > 0 else 0
print(f"Overall pass rate: {pass_rate:.2%}")

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment