Principle:Anthropics Anthropic sdk python Streaming Thinking Content
| Knowledge Sources | |
|---|---|
| Domains | Extended_Thinking, LLM, Reasoning |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Streaming Thinking Content is the principle of delivering chain-of-thought reasoning incrementally to the client in real time, rather than waiting for the full thinking phase to complete. The Anthropic Python SDK implements this through a system of thinking deltas, signature deltas, and accumulated snapshots that allow developers to observe the model's reasoning process as it unfolds.
Theory: Real-Time Streaming of Chain-of-Thought Reasoning
Extended thinking can produce substantial reasoning traces -- potentially thousands of tokens of internal analysis. Without streaming, the client must wait for the entire thinking phase and response generation to complete before receiving any data. Streaming solves this by delivering thinking content incrementally:
- Low perceived latency: Users see the model's reasoning appearing in real time, providing feedback that the model is working on their problem.
- Progressive rendering: Applications can display thinking content as it arrives, allowing users to follow the model's reasoning process.
- Early termination: Clients can close the stream early if the reasoning is going in an unwanted direction, saving tokens and time.
The streaming model for thinking follows the same server-sent events (SSE) pattern used for text streaming, but introduces thinking-specific event types.
Incremental Thinking Deltas and Signature Deltas
The streaming protocol for thinking blocks uses two types of deltas:
Thinking Deltas
As the model reasons, it produces incremental thinking deltas -- small chunks of reasoning text. Each delta contains:
- The delta text itself (the new characters since the last delta)
- The delta is of type
thinking_delta
These deltas arrive as content_block_delta server-sent events with a thinking_delta payload. The SDK transforms these raw events into higher-level ThinkingEvent objects that include both the delta and a running snapshot.
Signature Deltas
Once the thinking phase completes, the API sends a signature delta containing the cryptographic signature for the thinking block. This arrives as a content_block_delta event with a signature_delta payload. The SDK transforms this into a SignatureEvent object.
The signature is essential for multi-turn conversations -- it must be preserved and echoed back when including thinking blocks in subsequent requests.
Accumulating Partial Thinking into a Complete Snapshot
A key design principle in the streaming architecture is the snapshot accumulation pattern:
- The SDK maintains an internal message snapshot that evolves with each incoming event
- When a thinking delta arrives, it is appended to the corresponding content block's
thinkingfield in the snapshot - When a signature delta arrives, it sets the content block's
signaturefield in the snapshot - Each
ThinkingEventemitted to the developer contains both the delta (what just changed) and the snapshot (the full accumulated thinking so far)
This dual-value design serves two use cases:
- Incremental processing: Use
event.thinkingto append only the new content (e.g., for streaming display) - Complete state access: Use
event.snapshotto access the full thinking text at any point (e.g., for logging or analysis)
Event Flow
The typical event flow for a thinking-enabled streaming request is:
message_start: The message beginscontent_block_start: A thinking content block starts (type="thinking")content_block_delta(repeated): Thinking deltas arrive incrementally, each producing aThinkingEventcontent_block_delta: A signature delta arrives, producing aSignatureEventcontent_block_stop: The thinking block is completecontent_block_start: A text content block starts (type="text")content_block_delta(repeated): Text deltas arrive, each producing aTextEventcontent_block_stop: The text block is completemessage_stop: The message is complete
Design Considerations
Snapshot Immutability
The snapshot string grows monotonically -- deltas are only ever appended. This means a snapshot at time T is always a prefix of the snapshot at time T+1, simplifying client-side rendering.
Event Hierarchy
The SDK provides two levels of events:
- Raw events (
RawMessageStreamEvent): Low-level SSE events from the API withcontent_block_deltatypes - High-level events (
MessageStreamEvent): Processed events likeThinkingEvent,SignatureEvent, andTextEventthat are easier to consume
The build_events function bridges these two levels, transforming each raw event into zero or more high-level events.
Unified Stream Interface
Both synchronous (MessageStream) and asynchronous (AsyncMessageStream) stream classes use the same accumulation and event-building logic, ensuring consistent behavior regardless of the concurrency model.