Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Anthropics Anthropic sdk python Streaming Structured Output

From Leeroopedia
Knowledge Sources
Domains Structured_Output, LLM, Data_Extraction
Last Updated 2026-02-15 00:00 GMT

Overview

The Streaming Structured Output principle describes how the SDK enables progressive access to structured data while the model is still generating its response. By combining incremental JSON parsing with a two-phase validation strategy, the SDK allows developers to display partial results during streaming while guaranteeing full type safety on completion.

Incremental JSON Parsing During Streaming

When a model generates structured output via streaming, the response arrives as a sequence of text deltas. At any point during streaming, the accumulated text represents an incomplete JSON document: it may have unclosed braces, unterminated strings, or missing array elements. Standard JSON parsers reject such partial input, which would prevent the developer from accessing any structured data until the stream completes.

The SDK solves this problem by using a partial JSON parser (provided by the jiter library) that can extract meaningful data from incomplete JSON. This parser operates in "trailing-strings" mode, which handles:

  • Unclosed objects and arrays: The parser returns the data accumulated so far, treating unclosed containers as complete.
  • Unterminated strings: The parser returns the string content received so far, even if the closing quote has not arrived.
  • Missing values: Fields whose values have not started arriving are omitted from the result.

This means that as each text delta arrives and is appended to the snapshot, the SDK can immediately parse the accumulated text into a partial dictionary that represents the "best available" interpretation of the data so far.

Why Partial Parsing Matters

Partial parsing enables real-time user experiences that would otherwise be impossible with structured output:

  • Progressive display: A UI can show fields as they become available (e.g., displaying the movie title before the full review is complete).
  • Early termination: A developer can examine partial results and cancel the stream if the data is going in an unexpected direction.
  • Progress indicators: Applications can show which fields have been populated versus which are still pending.

Without partial parsing, the only option would be to wait for the complete response, negating the latency benefits of streaming.

Partial JSON Decoding with Trailing-Strings Mode

The jiter library provides the from_json() function with a partial_mode parameter. When set to "trailing-strings", the parser:

  1. Parses as much valid JSON structure as possible from the input bytes.
  2. For any string value that is still being received (the closing quote has not arrived), returns the text received so far as a complete string.
  3. For unclosed containers (objects or arrays), returns all the complete key-value pairs or elements accumulated so far.

The "trailing-strings" mode is specifically designed for the LLM streaming use case, where the model generates JSON tokens left-to-right and string values may span multiple text deltas. The alternative "off" mode would reject any incomplete input, while "trailing-strings" provides the most useful partial data.

The return type during streaming is Dict[str, Any] rather than a validated Pydantic model, because the partial data may not satisfy the model's required fields or type constraints. This is a deliberate design choice: partial snapshots are best-effort dictionaries, not validated instances.

Progressive Schema Validation: Partial During Stream, Full on Completion

The SDK employs a two-phase validation strategy that balances responsiveness with type safety:

Phase 1: During Streaming (Partial)

As text deltas arrive, the TextEvent.parsed_snapshot() method uses jiter.from_json() with partial_mode="trailing-strings" to produce a partial dictionary. This dictionary:

  • Contains whatever fields have been fully or partially received.
  • Does not undergo Pydantic validation.
  • Returns Dict[str, Any] (not the target Pydantic model type).
  • May contain incomplete data that would fail validation.

This is intentionally lenient: the goal is to provide the best possible view of the data at any moment, not to enforce correctness.

Phase 2: On Completion (Full)

When the stream receives a content_block_stop event (indicating the content block is complete), the SDK performs full Pydantic validation:

  1. The complete text of the content block is passed to parse_text().
  2. TypeAdapter(output_format).validate_json(text) performs strict JSON parsing and type validation.
  3. The validated instance is stored as parsed_output on the content block.
  4. The final ParsedMessage (accessible via stream.get_final_message()) carries the fully validated ResponseFormatT instance.

This means:

  • During streaming: Developers access partial Dict[str, Any] snapshots for progressive display.
  • After completion: Developers access a fully validated ResponseFormatT instance with full type safety.

Design Rationale

This two-phase approach is a pragmatic compromise between two extremes:

  • No validation during streaming (just raw text) would require developers to implement their own partial JSON parsing.
  • Full validation during streaming (strict Pydantic) would fail on almost every delta, since partial data rarely satisfies all required fields.

By providing lenient parsing during streaming and strict validation on completion, the SDK gives developers the best of both worlds: real-time access to partial data and guaranteed type safety for the final result.

Usage Example

import anthropic
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str
    pros: list[str]
    cons: list[str]

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Review the movie Inception"}],
    output_format=MovieReview,
) as stream:
    for event in stream:
        if event.type == "text":
            # Phase 1: partial Dict[str, Any] during streaming
            snapshot = event.parsed_snapshot()
            if snapshot:
                print(f"Partial: {snapshot}")

    # Phase 2: fully validated MovieReview on completion
    final = stream.get_final_message()
    review = final.parsed_output  # MovieReview instance
    print(f"{review.title}: {review.rating}/10")

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment