Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval Golden

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-14 09:00 GMT

Overview

Concrete data model class that represents a golden test case for agent evaluation. The Golden class is a Pydantic BaseModel that encapsulates the ground truth specification for a single evaluation scenario, including the input query, expected output, contextual information, and expected tool calls.

Description

The Golden class serves as the fundamental unit of evaluation data in DeepEval. Each Golden instance represents one test case with a defined input and optional expected behavior. Golden objects are collected into EvaluationDataset instances for batch evaluation.

Key capabilities:

  • Structured ground truth -- provides typed fields for all aspects of expected agent behavior.
  • Flexible specification -- only input is required; all other fields are optional, allowing test cases to specify only the aspects that matter for a given evaluation scenario.
  • Tool call expectations -- supports expected_tools for evaluating tool selection correctness.
  • Source tracking -- the source_file field enables tracing test cases back to their origin.
  • Pydantic validation -- leverages Pydantic's validation to ensure data integrity.

Usage

Import and create golden test cases:

from deepeval.dataset import Golden

Code Reference

Source Location

  • Repository: confident-ai/deepeval
  • File: deepeval/dataset/golden.py (lines 8--105)

Signature

class Golden(BaseModel):
    input: str
    expected_output: Optional[str] = None
    context: Optional[List[str]] = None
    expected_tools: Optional[List[ToolCall]] = None
    additional_metadata: Optional[Dict[str, Any]] = None
    source_file: Optional[str] = None
    ...

Import

from deepeval.dataset import Golden

Parent Class

  • BaseModel (from Pydantic)

I/O Contract

Fields

Field Contract
Name Type Default Description
input str REQUIRED The user query or task description that the agent receives.
expected_output Optional[str] None The ideal or reference response the agent should produce.
context Optional[List[str]] None List of contextual documents or information relevant to the expected behavior.
expected_tools Optional[List[ToolCall]] None List of tool calls the agent is expected to make, defined as ToolCall objects.
additional_metadata Optional[Dict[str, Any]] None Arbitrary key-value metadata for organizing and filtering test cases.
source_file Optional[str] None Path to the source file from which this golden test case was derived.

Usage Examples

Example 1: Simple Golden Test Case

Create a basic golden object with input and expected output.

from deepeval.dataset import Golden

golden = Golden(
    input="What's the weather in San Francisco?",
    expected_output="The weather is sunny.",
)

Example 2: Golden with Expected Tool Calls

Create a golden object that specifies expected tool usage.

from deepeval.dataset import Golden
from deepeval.test_case import ToolCall

golden = Golden(
    input="What's the weather?",
    expected_output="The weather is sunny.",
    expected_tools=[ToolCall(name="get_weather")],
)
  • The expected_tools field enables the ToolUseMetric to evaluate whether the agent selected the correct tools.

Example 3: Golden with Context and Metadata

Create a fully specified golden test case with context and metadata.

from deepeval.dataset import Golden

golden = Golden(
    input="Summarize the quarterly report",
    expected_output="Revenue increased 15% year-over-year...",
    context=["Q3 2025 revenue was $1.2B, up from $1.04B in Q3 2024."],
    additional_metadata={"category": "finance", "difficulty": "medium"},
    source_file="test_cases/finance.json",
)
  • The context field provides reference information for context-aware evaluation.
  • The additional_metadata field enables filtering and categorization of test cases.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment