Implementation:Confident ai Deepeval Golden

**Metadata**
Knowledge Sources	DeepEval
Domains	LLM_Evaluation AI_Agents
Last Updated	2026-02-14 09:00 GMT

Overview

Concrete data model class that represents a golden test case for agent evaluation. The Golden class is a Pydantic BaseModel that encapsulates the ground truth specification for a single evaluation scenario, including the input query, expected output, contextual information, and expected tool calls.

Description

The Golden class serves as the fundamental unit of evaluation data in DeepEval. Each Golden instance represents one test case with a defined input and optional expected behavior. Golden objects are collected into EvaluationDataset instances for batch evaluation.

Key capabilities:

Structured ground truth -- provides typed fields for all aspects of expected agent behavior.
Flexible specification -- only input is required; all other fields are optional, allowing test cases to specify only the aspects that matter for a given evaluation scenario.
Tool call expectations -- supports expected_tools for evaluating tool selection correctness.
Source tracking -- the source_file field enables tracing test cases back to their origin.
Pydantic validation -- leverages Pydantic's validation to ensure data integrity.

Usage

Import and create golden test cases:

from deepeval.dataset import Golden

Code Reference

Source Location

Repository: confident-ai/deepeval
File: deepeval/dataset/golden.py (lines 8--105)

Signature

class Golden(BaseModel):
    input: str
    expected_output: Optional[str] = None
    context: Optional[List[str]] = None
    expected_tools: Optional[List[ToolCall]] = None
    additional_metadata: Optional[Dict[str, Any]] = None
    source_file: Optional[str] = None
    ...

Import

from deepeval.dataset import Golden

Parent Class

BaseModel (from Pydantic)

I/O Contract

Fields

**Field Contract**
Name	Type	Default	Description
`input`	str	REQUIRED	The user query or task description that the agent receives.
`expected_output`	Optional[str]	`None`	The ideal or reference response the agent should produce.
`context`	Optional[List[str]]	`None`	List of contextual documents or information relevant to the expected behavior.
`expected_tools`	Optional[List[ToolCall]]	`None`	List of tool calls the agent is expected to make, defined as `ToolCall` objects.
`additional_metadata`	Optional[Dict[str, Any]]	`None`	Arbitrary key-value metadata for organizing and filtering test cases.
`source_file`	Optional[str]	`None`	Path to the source file from which this golden test case was derived.

Usage Examples

Example 1: Simple Golden Test Case

Create a basic golden object with input and expected output.

from deepeval.dataset import Golden

golden = Golden(
    input="What's the weather in San Francisco?",
    expected_output="The weather is sunny.",
)

Example 2: Golden with Expected Tool Calls

Create a golden object that specifies expected tool usage.

from deepeval.dataset import Golden
from deepeval.test_case import ToolCall

golden = Golden(
    input="What's the weather?",
    expected_output="The weather is sunny.",
    expected_tools=[ToolCall(name="get_weather")],
)

The expected_tools field enables the ToolUseMetric to evaluate whether the agent selected the correct tools.

Example 3: Golden with Context and Metadata

Create a fully specified golden test case with context and metadata.

from deepeval.dataset import Golden

golden = Golden(
    input="Summarize the quarterly report",
    expected_output="Revenue increased 15% year-over-year...",
    context=["Q3 2025 revenue was $1.2B, up from $1.04B in Q3 2024."],
    additional_metadata={"category": "finance", "difficulty": "medium"},
    source_file="test_cases/finance.json",
)

The context field provides reference information for context-aware evaluation.
The additional_metadata field enables filtering and categorization of test cases.

Related Pages

Principle:Confident_ai_Deepeval_Evaluation_Dataset_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment