Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Anthropics Anthropic sdk python Streaming Thinking Content

From Leeroopedia
Knowledge Sources
Domains Extended_Thinking, LLM, Reasoning
Last Updated 2026-02-15 00:00 GMT

Overview

Streaming Thinking Content is the principle of delivering chain-of-thought reasoning incrementally to the client in real time, rather than waiting for the full thinking phase to complete. The Anthropic Python SDK implements this through a system of thinking deltas, signature deltas, and accumulated snapshots that allow developers to observe the model's reasoning process as it unfolds.

Theory: Real-Time Streaming of Chain-of-Thought Reasoning

Extended thinking can produce substantial reasoning traces -- potentially thousands of tokens of internal analysis. Without streaming, the client must wait for the entire thinking phase and response generation to complete before receiving any data. Streaming solves this by delivering thinking content incrementally:

  • Low perceived latency: Users see the model's reasoning appearing in real time, providing feedback that the model is working on their problem.
  • Progressive rendering: Applications can display thinking content as it arrives, allowing users to follow the model's reasoning process.
  • Early termination: Clients can close the stream early if the reasoning is going in an unwanted direction, saving tokens and time.

The streaming model for thinking follows the same server-sent events (SSE) pattern used for text streaming, but introduces thinking-specific event types.

Incremental Thinking Deltas and Signature Deltas

The streaming protocol for thinking blocks uses two types of deltas:

Thinking Deltas

As the model reasons, it produces incremental thinking deltas -- small chunks of reasoning text. Each delta contains:

  • The delta text itself (the new characters since the last delta)
  • The delta is of type thinking_delta

These deltas arrive as content_block_delta server-sent events with a thinking_delta payload. The SDK transforms these raw events into higher-level ThinkingEvent objects that include both the delta and a running snapshot.

Signature Deltas

Once the thinking phase completes, the API sends a signature delta containing the cryptographic signature for the thinking block. This arrives as a content_block_delta event with a signature_delta payload. The SDK transforms this into a SignatureEvent object.

The signature is essential for multi-turn conversations -- it must be preserved and echoed back when including thinking blocks in subsequent requests.

Accumulating Partial Thinking into a Complete Snapshot

A key design principle in the streaming architecture is the snapshot accumulation pattern:

  1. The SDK maintains an internal message snapshot that evolves with each incoming event
  2. When a thinking delta arrives, it is appended to the corresponding content block's thinking field in the snapshot
  3. When a signature delta arrives, it sets the content block's signature field in the snapshot
  4. Each ThinkingEvent emitted to the developer contains both the delta (what just changed) and the snapshot (the full accumulated thinking so far)

This dual-value design serves two use cases:

  • Incremental processing: Use event.thinking to append only the new content (e.g., for streaming display)
  • Complete state access: Use event.snapshot to access the full thinking text at any point (e.g., for logging or analysis)

Event Flow

The typical event flow for a thinking-enabled streaming request is:

  1. message_start: The message begins
  2. content_block_start: A thinking content block starts (type="thinking")
  3. content_block_delta (repeated): Thinking deltas arrive incrementally, each producing a ThinkingEvent
  4. content_block_delta: A signature delta arrives, producing a SignatureEvent
  5. content_block_stop: The thinking block is complete
  6. content_block_start: A text content block starts (type="text")
  7. content_block_delta (repeated): Text deltas arrive, each producing a TextEvent
  8. content_block_stop: The text block is complete
  9. message_stop: The message is complete

Design Considerations

Snapshot Immutability

The snapshot string grows monotonically -- deltas are only ever appended. This means a snapshot at time T is always a prefix of the snapshot at time T+1, simplifying client-side rendering.

Event Hierarchy

The SDK provides two levels of events:

  • Raw events (RawMessageStreamEvent): Low-level SSE events from the API with content_block_delta types
  • High-level events (MessageStreamEvent): Processed events like ThinkingEvent, SignatureEvent, and TextEvent that are easier to consume

The build_events function bridges these two levels, transforming each raw event into zero or more high-level events.

Unified Stream Interface

Both synchronous (MessageStream) and asynchronous (AsyncMessageStream) stream classes use the same accumulation and event-building logic, ensuring consistent behavior regardless of the concurrency model.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment