Principle:Anthropics Anthropic sdk python Output Schema Definition
| Knowledge Sources | |
|---|---|
| Domains | Structured_Output, LLM, Data_Extraction |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
The Output Schema Definition principle describes how JSON Schema serves as a formal contract between the developer and the LLM for producing structured output. In the Anthropic Python SDK, developers define the desired output structure using Pydantic models, which the SDK then transforms into a normalized JSON Schema that the Anthropic API can enforce during generation. This transformation pipeline ensures that the model's output conforms to a predictable, machine-parseable format.
JSON Schema as a Contract for Structured LLM Output
When requesting structured output from a large language model, the fundamental challenge is bridging the gap between free-form text generation and typed data structures. JSON Schema provides the formal specification language that defines exactly what shape the output must take: which fields are required, what types they must have, how nested objects and arrays are structured, and what constraints apply.
In the context of the Anthropic API, a JSON Schema is passed as part of the request's output format configuration. The API uses this schema to constrain the model's generation so that its output is guaranteed to be valid JSON conforming to the given schema. This turns the LLM from a text generator into a structured data producer with compile-time-like guarantees.
The key insight is that the schema acts as a bilateral contract:
- For the developer: It specifies the exact data shape they expect, enabling type-safe deserialization on the client side.
- For the model: It constrains the generation space, ensuring the output is valid JSON that conforms to the declared structure.
Pydantic Model to JSON Schema Transformation Pipeline
Rather than requiring developers to write raw JSON Schema dictionaries, the SDK leverages Pydantic BaseModel subclasses as the schema definition language. This approach provides several advantages:
- Familiar Python types: Developers define schemas using standard Python type annotations (str, int, float, list[str], Optional[str], etc.) on Pydantic model classes.
- Automatic schema generation: Pydantic's
TypeAdapter.json_schema()method converts the model class into a JSON Schema dictionary automatically. - Runtime validation: The same Pydantic model used to define the schema is later used to validate and deserialize the model's JSON output, ensuring end-to-end type safety.
The transformation pipeline proceeds in three stages:
- Define: The developer creates a
pydantic.BaseModelsubclass with typed fields. - Generate Schema: The SDK calls
TypeAdapter(model_class).json_schema()to produce a raw JSON Schema dictionary. - Normalize: The SDK passes the raw schema through
transform_schema()which strips unsupported properties, enforces API-required constraints, and produces a clean schema suitable for the API.
from pydantic import BaseModel
class MovieReview(BaseModel):
title: str
rating: float
summary: str
pros: list[str]
cons: list[str]
This simple class definition encodes the complete output contract: the model must produce a JSON object with exactly these five fields, with the specified types for each.
Schema Normalization for API Compatibility
The raw JSON Schema produced by Pydantic contains many properties and annotations that the Anthropic API does not support or expects in a specific form. The transform_schema() function performs critical normalization steps:
additionalProperties: false Enforcement
For every object type in the schema, transform_schema() forcibly sets additionalProperties: false. This is required by the Anthropic API and ensures the model cannot produce extra fields beyond what the schema declares. The original value of additionalProperties (if any) is dropped.
Unsupported Property Handling
JSON Schema supports many constraint keywords (e.g., minimum, maximum, pattern, maxLength) that the API does not natively enforce. Rather than silently dropping these constraints, transform_schema() appends them to the description field as a hint to the model. For example, {"type": "integer", "minimum": 1, "maximum": 10} becomes {"type": "integer", "description": "{minimum: 1, maximum: 10}"}. This preserves the developer's intent while conforming to the API's supported schema subset.
Type-Specific Transformations
- Objects: Properties are recursively transformed;
additionalPropertiesis set tofalse. - Arrays: The
itemssub-schema is recursively transformed; onlyminItemsof 0 or 1 are kept natively. - Strings: Supported formats (date-time, email, uri, uuid, etc.) are preserved; unsupported formats are moved to the description.
- Union types:
anyOf,oneOf, andallOfvariants are each recursively transformed. Notably,oneOfis converted toanyOffor API compatibility. - References:
$refpointers are preserved as-is, and$defsblocks are recursively transformed.
Design Rationale
This normalization approach follows the principle of graceful degradation: constraints that cannot be enforced at the API level are still communicated to the model through natural language in the description, maximizing the chance that the model will respect them. Meanwhile, constraints that can be enforced (like the overall object structure and required fields) are strictly maintained.