Principle:Anthropics Anthropic sdk python Thinking Request Execution
| Knowledge Sources | |
|---|---|
| Domains | Extended_Thinking, LLM, Reasoning |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Thinking Request Execution is the principle of augmenting standard API message creation requests with an optional reasoning configuration. In the Anthropic Python SDK, the Messages.create() method accepts a thinking parameter that controls whether and how the model performs extended chain-of-thought reasoning before producing its visible response.
This principle covers the integration point where the thinking configuration meets the request lifecycle, including parameter validation, deprecation warnings, and the structural guarantees of the response.
Theory: Augmenting Requests with Reasoning Configuration
The standard message creation flow in the Anthropic API follows a request-response pattern: the client sends a model identifier, a list of messages, and configuration parameters, and receives a response containing content blocks. Extended thinking augments this flow by introducing reasoning as a first-class concern:
- Request augmentation: The
thinkingparameter is added as an optional extension to the standard parameter set. When omitted (or set to the sentinelOmitvalue), the request behaves identically to a non-thinking request. - Model-specific behavior: Not all models support all thinking modes equally. The SDK includes runtime checks that issue warnings when certain model-mode combinations are suboptimal.
- Response structure change: When thinking is enabled, the response's
contentlist containsThinkingBlockitems beforeTextBlockitems, representing the model's chain-of-thought followed by its final answer.
The Thinking Parameter as Optional Extension
The thinking parameter follows the optional extension pattern common in SDK design:
- Default behavior preserved: When
thinkingis not provided, the API call works exactly as it would without extended thinking support. This ensures backward compatibility. - Additive enrichment: When provided, the parameter enriches the request without changing the fundamental semantics of message creation. The response is still a
Messageobject; it just contains additional content block types. - Type-safe configuration: The parameter accepts a
ThinkingConfigParamunion type, ensuring that only valid configurations are passed. The discriminated union pattern (keyed on thetypefield) makes it impossible to construct an ambiguous configuration.
Runtime Warnings for Deprecated Thinking Modes
The SDK implements a progressive deprecation strategy for thinking modes:
Model-Specific Warnings
When a developer uses thinking.type="enabled" with certain models (such as claude-opus-4-6), the SDK emits a UserWarning at runtime recommending the use of thinking.type="adaptive" instead. This reflects Anthropic's finding that adaptive thinking produces better model performance than fixed-budget thinking for these models.
The warning mechanism:
- Checks the model identifier against a known list of models that perform better with adaptive thinking
- Checks that the thinking type is specifically
"enabled"(not adaptive or disabled) - Emits a
UserWarning(not aDeprecationWarning) since the mode still functions but is suboptimal - Uses
stacklevel=3so the warning points to the caller's code, not internal SDK code
Deprecated Model Warnings
Separately, the SDK also checks if the model itself is deprecated (reaching end-of-life) and emits a DeprecationWarning with migration guidance. This is independent of the thinking configuration but fires during the same request execution path.
Response Structure with Thinking
When thinking is enabled (or adaptive mode produces thinking), the response follows a specific ordering:
- ThinkingBlock(s): Zero or more blocks containing the model's internal reasoning text and a cryptographic signature for multi-turn verification
- TextBlock(s): One or more blocks containing the model's visible response to the user
This ordering guarantee is important because it ensures that:
- The model has fully completed its reasoning before producing its answer
- Clients can process thinking and text blocks in a single pass through the content list
- The thinking blocks provide an auditable trace of the model's reasoning process
Design Considerations
- Omit sentinel pattern: The SDK uses an
Omitsentinel value rather thanNoneto distinguish between "not provided" and "explicitly set to null." This allows the thinking parameter to be cleanly excluded from the serialized request body when not needed. - Timeout adjustment: For non-streaming requests, the SDK may adjust the timeout based on
max_tokensand the model, accounting for the additional time that thinking may require. - Unified create method: The same
create()method handles both streaming and non-streaming requests via thestreamparameter. The thinking configuration applies identically in both modes.