Workflow:Anthropics Anthropic sdk python Extended Thinking Reasoning
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Reasoning, Extended_Thinking |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
End-to-end process for enabling and consuming Claude's extended thinking capability, which allows the model to reason step-by-step before producing its final response.
Description
This workflow demonstrates how to activate Claude's extended thinking feature, which allocates a dedicated token budget for internal reasoning before generating the visible response. When enabled, Claude produces ThinkingBlock content alongside TextBlock content, allowing applications to inspect the model's chain-of-thought reasoning. The workflow covers both synchronous and streaming approaches, including how to separate thinking content from response content and how to manage the token budget.
Usage
Execute this workflow when tackling complex problems that benefit from step-by-step reasoning (math, logic, analysis, coding), when you need transparency into the model's decision-making process, or when response quality is more important than latency and you want the model to "think before speaking."
Execution Steps
Step 1: Thinking Configuration
Configure the thinking parameter in the message request. Set the type to "enabled" and specify a budget_tokens value that determines how many tokens Claude can use for internal reasoning. The max_tokens parameter must be set high enough to accommodate both thinking and response tokens.
Key considerations:
- The thinking parameter accepts a ThinkingConfigParam with type "enabled", "disabled", or "adaptive"
- budget_tokens controls the maximum tokens allocated for thinking (not the response)
- max_tokens must be large enough for both thinking and response output
- Adaptive thinking lets the model decide whether to think based on the query complexity
Step 2: Request Execution
Send the message request with the thinking configuration. The API processes the request, generates thinking content first, then produces the final response based on that reasoning. This may result in higher latency but improved response quality for complex tasks.
Key considerations:
- Extended thinking increases latency proportional to the thinking budget
- The API returns both thinking and text content blocks in the response
- Token usage includes both thinking tokens and response tokens
- Not all models support extended thinking; check model compatibility
Step 3: Content Block Separation
Process the response by iterating through content blocks and separating thinking content from text content. ThinkingBlock objects (type "thinking") contain the model's reasoning, while TextBlock objects (type "text") contain the final response. RedactedThinkingBlock objects may appear when thinking content is filtered.
Key considerations:
- Content blocks appear in order: thinking blocks first, then text blocks
- ThinkingBlock has a thinking field containing the reasoning text
- RedactedThinkingBlock indicates filtered thinking content (no text available)
- TextBlock contains the final response informed by the thinking process
Step 4: Streaming Thinking Content
For real-time display of thinking, use the streaming interface. Thinking events arrive before text events, allowing applications to show the reasoning process as it unfolds. Track the event type to switch between displaying thinking and response content.
Key considerations:
- Stream events include "thinking" type for thinking deltas and "text" type for response deltas
- Track state transitions between thinking and text phases for proper display
- Thinking content streams incrementally just like text content
- The stream provides delta (incremental) and snapshot (accumulated) values for both thinking and text
Step 5: Multi_turn Thinking Conversations
When using extended thinking in multi-turn conversations, include the thinking blocks from previous turns in the conversation history. This preserves the reasoning context across turns and allows Claude to build on previous analysis.
Key considerations:
- Include ThinkingBlockParam objects in assistant message content for multi-turn conversations
- RedactedThinkingBlockParam objects must also be preserved in conversation history
- The thinking budget can be adjusted per turn based on expected complexity
- Token usage from thinking contributes to the overall context window consumption