Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Anthropics Anthropic sdk python Extended Thinking Reasoning

From Leeroopedia
Knowledge Sources
Domains LLMs, Reasoning, Extended_Thinking
Last Updated 2026-02-15 12:00 GMT

Overview

End-to-end process for enabling and consuming Claude's extended thinking capability, which allows the model to reason step-by-step before producing its final response.

Description

This workflow demonstrates how to activate Claude's extended thinking feature, which allocates a dedicated token budget for internal reasoning before generating the visible response. When enabled, Claude produces ThinkingBlock content alongside TextBlock content, allowing applications to inspect the model's chain-of-thought reasoning. The workflow covers both synchronous and streaming approaches, including how to separate thinking content from response content and how to manage the token budget.

Usage

Execute this workflow when tackling complex problems that benefit from step-by-step reasoning (math, logic, analysis, coding), when you need transparency into the model's decision-making process, or when response quality is more important than latency and you want the model to "think before speaking."

Execution Steps

Step 1: Thinking Configuration

Configure the thinking parameter in the message request. Set the type to "enabled" and specify a budget_tokens value that determines how many tokens Claude can use for internal reasoning. The max_tokens parameter must be set high enough to accommodate both thinking and response tokens.

Key considerations:

  • The thinking parameter accepts a ThinkingConfigParam with type "enabled", "disabled", or "adaptive"
  • budget_tokens controls the maximum tokens allocated for thinking (not the response)
  • max_tokens must be large enough for both thinking and response output
  • Adaptive thinking lets the model decide whether to think based on the query complexity

Step 2: Request Execution

Send the message request with the thinking configuration. The API processes the request, generates thinking content first, then produces the final response based on that reasoning. This may result in higher latency but improved response quality for complex tasks.

Key considerations:

  • Extended thinking increases latency proportional to the thinking budget
  • The API returns both thinking and text content blocks in the response
  • Token usage includes both thinking tokens and response tokens
  • Not all models support extended thinking; check model compatibility

Step 3: Content Block Separation

Process the response by iterating through content blocks and separating thinking content from text content. ThinkingBlock objects (type "thinking") contain the model's reasoning, while TextBlock objects (type "text") contain the final response. RedactedThinkingBlock objects may appear when thinking content is filtered.

Key considerations:

  • Content blocks appear in order: thinking blocks first, then text blocks
  • ThinkingBlock has a thinking field containing the reasoning text
  • RedactedThinkingBlock indicates filtered thinking content (no text available)
  • TextBlock contains the final response informed by the thinking process

Step 4: Streaming Thinking Content

For real-time display of thinking, use the streaming interface. Thinking events arrive before text events, allowing applications to show the reasoning process as it unfolds. Track the event type to switch between displaying thinking and response content.

Key considerations:

  • Stream events include "thinking" type for thinking deltas and "text" type for response deltas
  • Track state transitions between thinking and text phases for proper display
  • Thinking content streams incrementally just like text content
  • The stream provides delta (incremental) and snapshot (accumulated) values for both thinking and text

Step 5: Multi_turn Thinking Conversations

When using extended thinking in multi-turn conversations, include the thinking blocks from previous turns in the conversation history. This preserves the reasoning context across turns and allows Claude to build on previous analysis.

Key considerations:

  • Include ThinkingBlockParam objects in assistant message content for multi-turn conversations
  • RedactedThinkingBlockParam objects must also be preserved in conversation history
  • The thinking budget can be adjusted per turn based on expected complexity
  • Token usage from thinking contributes to the overall context window consumption

Execution Diagram

GitHub URL

Workflow Repository