Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Anthropics Anthropic sdk python Structured Output Extraction

From Leeroopedia
Knowledge Sources
Domains LLMs, Data_Extraction, Structured_Output
Last Updated 2026-02-15 12:00 GMT

Overview

End-to-end process for extracting structured, typed data from Claude's responses by parsing them into Pydantic models using the Anthropic Python SDK's parse and streaming parse capabilities.

Description

This workflow demonstrates how to obtain structured, validated output from Claude by specifying a Pydantic BaseModel as the desired output format. The SDK's parse() method and streaming equivalent automatically constrain the model's output to match the provided schema and deserialize the response into a typed Python object. This eliminates manual JSON parsing and validation, providing type-safe access to extracted data. The workflow supports both synchronous one-shot parsing and incremental streaming with partial snapshots.

Usage

Execute this workflow when you need to extract structured data from natural language (e.g., extracting order details, parsing entities, classifying text), when building pipelines that require typed inputs from LLM outputs, or when you want guaranteed schema compliance from Claude's responses.

Execution Steps

Step 1: Output Schema Definition

Define the desired output structure as a Pydantic BaseModel. The model's fields, types, and optional validators define the schema that Claude's response must conform to. Nested models, lists, enums, and optional fields are all supported.

Key considerations:

  • Use Pydantic BaseModel with typed fields to define the output structure
  • Field names and types directly translate to the JSON Schema sent to the API
  • Nested models (BaseModel within BaseModel) create nested object schemas
  • Optional fields, default values, and Field validators are respected

Step 2: Parse Request Execution

Call client.messages.parse() instead of client.messages.create(), passing the Pydantic model class as the output_format parameter. The SDK automatically converts the model to a JSON Schema, instructs Claude to produce conforming output, and deserializes the response.

Key considerations:

  • The parse() method accepts the same parameters as create() plus output_format
  • output_format takes a Pydantic BaseModel class (not an instance)
  • The SDK handles JSON Schema generation from the model class automatically
  • The returned object is a ParsedMessage with an additional parsed_output field

Step 3: Parsed Output Access

Access the structured result through the parsed_output property of the returned ParsedMessage. This is a fully instantiated Pydantic model with validated, typed fields ready for use in application logic.

Key considerations:

  • parsed_output is an instance of the specified Pydantic model
  • All Pydantic validation rules apply to the parsed output
  • If parsing fails, the raw text response is still accessible through the message content blocks
  • The message also contains standard fields: stop_reason, usage, model, etc.

Step 4: Streaming Structured Output

For real-time parsing, use client.messages.stream() with the output_format parameter. During streaming, call stream.parsed_snapshot() on text events to get partially parsed objects that update incrementally as more data arrives.

Key considerations:

  • Streaming parse uses the same output_format parameter as non-streaming
  • parsed_snapshot() returns a partial Pydantic model instance (fields may be None until populated)
  • After the stream completes, get_final_message().parsed_output provides the complete parsed object
  • This enables progressive UI updates as structured data fills in

Execution Diagram

GitHub URL

Workflow Repository