Implementation:Confident ai Deepeval Golden
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-14 09:00 GMT |
Overview
Concrete data model class that represents a golden test case for agent evaluation. The Golden class is a Pydantic BaseModel that encapsulates the ground truth specification for a single evaluation scenario, including the input query, expected output, contextual information, and expected tool calls.
Description
The Golden class serves as the fundamental unit of evaluation data in DeepEval. Each Golden instance represents one test case with a defined input and optional expected behavior. Golden objects are collected into EvaluationDataset instances for batch evaluation.
Key capabilities:
- Structured ground truth -- provides typed fields for all aspects of expected agent behavior.
- Flexible specification -- only
inputis required; all other fields are optional, allowing test cases to specify only the aspects that matter for a given evaluation scenario. - Tool call expectations -- supports
expected_toolsfor evaluating tool selection correctness. - Source tracking -- the
source_filefield enables tracing test cases back to their origin. - Pydantic validation -- leverages Pydantic's validation to ensure data integrity.
Usage
Import and create golden test cases:
from deepeval.dataset import Golden
Code Reference
Source Location
- Repository:
confident-ai/deepeval - File:
deepeval/dataset/golden.py(lines 8--105)
Signature
class Golden(BaseModel):
input: str
expected_output: Optional[str] = None
context: Optional[List[str]] = None
expected_tools: Optional[List[ToolCall]] = None
additional_metadata: Optional[Dict[str, Any]] = None
source_file: Optional[str] = None
...
Import
from deepeval.dataset import Golden
Parent Class
BaseModel(from Pydantic)
I/O Contract
Fields
| Name | Type | Default | Description |
|---|---|---|---|
input |
str | REQUIRED | The user query or task description that the agent receives. |
expected_output |
Optional[str] | None |
The ideal or reference response the agent should produce. |
context |
Optional[List[str]] | None |
List of contextual documents or information relevant to the expected behavior. |
expected_tools |
Optional[List[ToolCall]] | None |
List of tool calls the agent is expected to make, defined as ToolCall objects.
|
additional_metadata |
Optional[Dict[str, Any]] | None |
Arbitrary key-value metadata for organizing and filtering test cases. |
source_file |
Optional[str] | None |
Path to the source file from which this golden test case was derived. |
Usage Examples
Example 1: Simple Golden Test Case
Create a basic golden object with input and expected output.
from deepeval.dataset import Golden
golden = Golden(
input="What's the weather in San Francisco?",
expected_output="The weather is sunny.",
)
Example 2: Golden with Expected Tool Calls
Create a golden object that specifies expected tool usage.
from deepeval.dataset import Golden
from deepeval.test_case import ToolCall
golden = Golden(
input="What's the weather?",
expected_output="The weather is sunny.",
expected_tools=[ToolCall(name="get_weather")],
)
- The
expected_toolsfield enables theToolUseMetricto evaluate whether the agent selected the correct tools.
Example 3: Golden with Context and Metadata
Create a fully specified golden test case with context and metadata.
from deepeval.dataset import Golden
golden = Golden(
input="Summarize the quarterly report",
expected_output="Revenue increased 15% year-over-year...",
context=["Q3 2025 revenue was $1.2B, up from $1.04B in Q3 2024."],
additional_metadata={"category": "finance", "difficulty": "medium"},
source_file="test_cases/finance.json",
)
- The
contextfield provides reference information for context-aware evaluation. - The
additional_metadatafield enables filtering and categorization of test cases.