Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval EvaluationResult

From Leeroopedia

Overview

EvaluationResult is a data class in the deepeval library that encapsulates the outcomes of a batch evaluation run. It provides structured access to per-case test results and an optional link to the Confident AI cloud dashboard for interactive visualization and persistent storage.

This is an API Doc implementation.

Source

Import

from deepeval.evaluate.types import EvaluationResult

Class Attributes

Attribute Type Description
test_results List[TestResult] A list of TestResult objects, one per test case evaluated. Each TestResult contains the test case data, per-metric scores, pass/fail statuses, and optional reasoning.
confident_link Optional[str] A URL to the evaluation results on the Confident AI cloud dashboard. Present only when the user is authenticated with a valid CONFIDENT_API_KEY.

Input / Output

  • Inputs: EvaluationResult is returned by the evaluate() function. It is not typically constructed directly by the user.
  • Outputs: Provides access to:
    • test_results -- Iterable list of per-case results for programmatic analysis.
    • confident_link -- URL for viewing results in the cloud dashboard.

TestResult Structure

Each TestResult object within test_results contains:

  • metrics_data -- A list of metric evaluation results, each with the metric name, score, threshold, success status, and optional reason.
  • input -- The original test case input.
  • actual_output -- The LLM's response.
  • expected_output -- The expected reference output (if provided).
  • context -- The ground-truth context (if provided).
  • retrieval_context -- The retrieved context (if provided).

Example

Accessing Evaluation Results

from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase

# Define metrics and test cases
relevancy = AnswerRelevancyMetric(threshold=0.7)
faithfulness = FaithfulnessMetric(threshold=0.7)

test_cases = [
    LLMTestCase(
        input="What is deep learning?",
        actual_output="Deep learning is a subset of machine learning using neural networks.",
        retrieval_context=["Deep learning uses multi-layered neural networks to learn from data."]
    )
]

# Run evaluation
result = evaluate(test_cases=test_cases, metrics=[relevancy, faithfulness])

# Access per-case results
for test_result in result.test_results:
    print(f"Input: {test_result.input}")
    for metric_data in test_result.metrics_data:
        print(f"  Metric: {metric_data.name}")
        print(f"  Score: {metric_data.score}")
        print(f"  Passed: {metric_data.success}")
        print(f"  Reason: {metric_data.reason}")

# Access cloud dashboard link
if result.confident_link:
    print(f"View results at: {result.confident_link}")

Programmatic Analysis

# Compute aggregate pass rate
total_metrics = 0
passed_metrics = 0

for test_result in result.test_results:
    for metric_data in test_result.metrics_data:
        total_metrics += 1
        if metric_data.success:
            passed_metrics += 1

pass_rate = passed_metrics / total_metrics if total_metrics > 0 else 0
print(f"Overall pass rate: {pass_rate:.2%}")

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment