Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Anthropics Anthropic sdk python Thinking Request Execution

From Leeroopedia
Knowledge Sources
Domains Extended_Thinking, LLM, Reasoning
Last Updated 2026-02-15 00:00 GMT

Overview

Thinking Request Execution is the principle of augmenting standard API message creation requests with an optional reasoning configuration. In the Anthropic Python SDK, the Messages.create() method accepts a thinking parameter that controls whether and how the model performs extended chain-of-thought reasoning before producing its visible response.

This principle covers the integration point where the thinking configuration meets the request lifecycle, including parameter validation, deprecation warnings, and the structural guarantees of the response.

Theory: Augmenting Requests with Reasoning Configuration

The standard message creation flow in the Anthropic API follows a request-response pattern: the client sends a model identifier, a list of messages, and configuration parameters, and receives a response containing content blocks. Extended thinking augments this flow by introducing reasoning as a first-class concern:

  1. Request augmentation: The thinking parameter is added as an optional extension to the standard parameter set. When omitted (or set to the sentinel Omit value), the request behaves identically to a non-thinking request.
  2. Model-specific behavior: Not all models support all thinking modes equally. The SDK includes runtime checks that issue warnings when certain model-mode combinations are suboptimal.
  3. Response structure change: When thinking is enabled, the response's content list contains ThinkingBlock items before TextBlock items, representing the model's chain-of-thought followed by its final answer.

The Thinking Parameter as Optional Extension

The thinking parameter follows the optional extension pattern common in SDK design:

  • Default behavior preserved: When thinking is not provided, the API call works exactly as it would without extended thinking support. This ensures backward compatibility.
  • Additive enrichment: When provided, the parameter enriches the request without changing the fundamental semantics of message creation. The response is still a Message object; it just contains additional content block types.
  • Type-safe configuration: The parameter accepts a ThinkingConfigParam union type, ensuring that only valid configurations are passed. The discriminated union pattern (keyed on the type field) makes it impossible to construct an ambiguous configuration.

Runtime Warnings for Deprecated Thinking Modes

The SDK implements a progressive deprecation strategy for thinking modes:

Model-Specific Warnings

When a developer uses thinking.type="enabled" with certain models (such as claude-opus-4-6), the SDK emits a UserWarning at runtime recommending the use of thinking.type="adaptive" instead. This reflects Anthropic's finding that adaptive thinking produces better model performance than fixed-budget thinking for these models.

The warning mechanism:

  • Checks the model identifier against a known list of models that perform better with adaptive thinking
  • Checks that the thinking type is specifically "enabled" (not adaptive or disabled)
  • Emits a UserWarning (not a DeprecationWarning) since the mode still functions but is suboptimal
  • Uses stacklevel=3 so the warning points to the caller's code, not internal SDK code

Deprecated Model Warnings

Separately, the SDK also checks if the model itself is deprecated (reaching end-of-life) and emits a DeprecationWarning with migration guidance. This is independent of the thinking configuration but fires during the same request execution path.

Response Structure with Thinking

When thinking is enabled (or adaptive mode produces thinking), the response follows a specific ordering:

  1. ThinkingBlock(s): Zero or more blocks containing the model's internal reasoning text and a cryptographic signature for multi-turn verification
  2. TextBlock(s): One or more blocks containing the model's visible response to the user

This ordering guarantee is important because it ensures that:

  • The model has fully completed its reasoning before producing its answer
  • Clients can process thinking and text blocks in a single pass through the content list
  • The thinking blocks provide an auditable trace of the model's reasoning process

Design Considerations

  • Omit sentinel pattern: The SDK uses an Omit sentinel value rather than None to distinguish between "not provided" and "explicitly set to null." This allows the thinking parameter to be cleanly excluded from the serialized request body when not needed.
  • Timeout adjustment: For non-streaming requests, the SDK may adjust the timeout based on max_tokens and the model, accounting for the additional time that thinking may require.
  • Unified create method: The same create() method handles both streaming and non-streaming requests via the stream parameter. The thinking configuration applies identically in both modes.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment