Implementation:Confident ai Deepeval EvaluationResult
Appearance
Overview
EvaluationResult is a data class in the deepeval library that encapsulates the outcomes of a batch evaluation run. It provides structured access to per-case test results and an optional link to the Confident AI cloud dashboard for interactive visualization and persistent storage.
This is an API Doc implementation.
Source
- Repository: Confident AI Deepeval
- File:
deepeval/evaluate/types.py, lines 1-50 - Class:
EvaluationResult
Import
from deepeval.evaluate.types import EvaluationResult
Class Attributes
| Attribute | Type | Description |
|---|---|---|
test_results |
List[TestResult] |
A list of TestResult objects, one per test case evaluated. Each TestResult contains the test case data, per-metric scores, pass/fail statuses, and optional reasoning.
|
confident_link |
Optional[str] |
A URL to the evaluation results on the Confident AI cloud dashboard. Present only when the user is authenticated with a valid CONFIDENT_API_KEY.
|
Input / Output
- Inputs:
EvaluationResultis returned by theevaluate()function. It is not typically constructed directly by the user. - Outputs: Provides access to:
test_results-- Iterable list of per-case results for programmatic analysis.confident_link-- URL for viewing results in the cloud dashboard.
TestResult Structure
Each TestResult object within test_results contains:
metrics_data-- A list of metric evaluation results, each with the metric name, score, threshold, success status, and optional reason.input-- The original test case input.actual_output-- The LLM's response.expected_output-- The expected reference output (if provided).context-- The ground-truth context (if provided).retrieval_context-- The retrieved context (if provided).
Example
Accessing Evaluation Results
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase
# Define metrics and test cases
relevancy = AnswerRelevancyMetric(threshold=0.7)
faithfulness = FaithfulnessMetric(threshold=0.7)
test_cases = [
LLMTestCase(
input="What is deep learning?",
actual_output="Deep learning is a subset of machine learning using neural networks.",
retrieval_context=["Deep learning uses multi-layered neural networks to learn from data."]
)
]
# Run evaluation
result = evaluate(test_cases=test_cases, metrics=[relevancy, faithfulness])
# Access per-case results
for test_result in result.test_results:
print(f"Input: {test_result.input}")
for metric_data in test_result.metrics_data:
print(f" Metric: {metric_data.name}")
print(f" Score: {metric_data.score}")
print(f" Passed: {metric_data.success}")
print(f" Reason: {metric_data.reason}")
# Access cloud dashboard link
if result.confident_link:
print(f"View results at: {result.confident_link}")
Programmatic Analysis
# Compute aggregate pass rate
total_metrics = 0
passed_metrics = 0
for test_result in result.test_results:
for metric_data in test_result.metrics_data:
total_metrics += 1
if metric_data.success:
passed_metrics += 1
pass_rate = passed_metrics / total_metrics if total_metrics > 0 else 0
print(f"Overall pass rate: {pass_rate:.2%}")
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment