Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Anthropics Anthropic sdk python Output Schema Definition

From Leeroopedia
Knowledge Sources
Domains Structured_Output, LLM, Data_Extraction
Last Updated 2026-02-15 00:00 GMT

Overview

The Output Schema Definition principle describes how JSON Schema serves as a formal contract between the developer and the LLM for producing structured output. In the Anthropic Python SDK, developers define the desired output structure using Pydantic models, which the SDK then transforms into a normalized JSON Schema that the Anthropic API can enforce during generation. This transformation pipeline ensures that the model's output conforms to a predictable, machine-parseable format.

JSON Schema as a Contract for Structured LLM Output

When requesting structured output from a large language model, the fundamental challenge is bridging the gap between free-form text generation and typed data structures. JSON Schema provides the formal specification language that defines exactly what shape the output must take: which fields are required, what types they must have, how nested objects and arrays are structured, and what constraints apply.

In the context of the Anthropic API, a JSON Schema is passed as part of the request's output format configuration. The API uses this schema to constrain the model's generation so that its output is guaranteed to be valid JSON conforming to the given schema. This turns the LLM from a text generator into a structured data producer with compile-time-like guarantees.

The key insight is that the schema acts as a bilateral contract:

  • For the developer: It specifies the exact data shape they expect, enabling type-safe deserialization on the client side.
  • For the model: It constrains the generation space, ensuring the output is valid JSON that conforms to the declared structure.

Pydantic Model to JSON Schema Transformation Pipeline

Rather than requiring developers to write raw JSON Schema dictionaries, the SDK leverages Pydantic BaseModel subclasses as the schema definition language. This approach provides several advantages:

  1. Familiar Python types: Developers define schemas using standard Python type annotations (str, int, float, list[str], Optional[str], etc.) on Pydantic model classes.
  2. Automatic schema generation: Pydantic's TypeAdapter.json_schema() method converts the model class into a JSON Schema dictionary automatically.
  3. Runtime validation: The same Pydantic model used to define the schema is later used to validate and deserialize the model's JSON output, ensuring end-to-end type safety.

The transformation pipeline proceeds in three stages:

  1. Define: The developer creates a pydantic.BaseModel subclass with typed fields.
  2. Generate Schema: The SDK calls TypeAdapter(model_class).json_schema() to produce a raw JSON Schema dictionary.
  3. Normalize: The SDK passes the raw schema through transform_schema() which strips unsupported properties, enforces API-required constraints, and produces a clean schema suitable for the API.
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str
    pros: list[str]
    cons: list[str]

This simple class definition encodes the complete output contract: the model must produce a JSON object with exactly these five fields, with the specified types for each.

Schema Normalization for API Compatibility

The raw JSON Schema produced by Pydantic contains many properties and annotations that the Anthropic API does not support or expects in a specific form. The transform_schema() function performs critical normalization steps:

additionalProperties: false Enforcement

For every object type in the schema, transform_schema() forcibly sets additionalProperties: false. This is required by the Anthropic API and ensures the model cannot produce extra fields beyond what the schema declares. The original value of additionalProperties (if any) is dropped.

Unsupported Property Handling

JSON Schema supports many constraint keywords (e.g., minimum, maximum, pattern, maxLength) that the API does not natively enforce. Rather than silently dropping these constraints, transform_schema() appends them to the description field as a hint to the model. For example, {"type": "integer", "minimum": 1, "maximum": 10} becomes {"type": "integer", "description": "{minimum: 1, maximum: 10}"}. This preserves the developer's intent while conforming to the API's supported schema subset.

Type-Specific Transformations

  • Objects: Properties are recursively transformed; additionalProperties is set to false.
  • Arrays: The items sub-schema is recursively transformed; only minItems of 0 or 1 are kept natively.
  • Strings: Supported formats (date-time, email, uri, uuid, etc.) are preserved; unsupported formats are moved to the description.
  • Union types: anyOf, oneOf, and allOf variants are each recursively transformed. Notably, oneOf is converted to anyOf for API compatibility.
  • References: $ref pointers are preserved as-is, and $defs blocks are recursively transformed.

Design Rationale

This normalization approach follows the principle of graceful degradation: constraints that cannot be enforced at the API level are still communicated to the model through natural language in the description, maximizing the chance that the model will respect them. Meanwhile, constraints that can be enforced (like the overall object structure and required fields) are strictly maintained.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment