Principle:Anthropics Anthropic sdk python API Request Execution
| Knowledge Sources | |
|---|---|
| Domains | API_Client, LLM |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
The API Request Execution principle describes how the Anthropic Python SDK dispatches HTTP POST requests to the Messages API endpoint, handling parameter transformation, timeout calculation, response parsing, streaming dispatch, and automatic retry with exponential back-off. The Messages.create() method is the central point where typed parameters become a network request and a parsed response.
Theoretical Basis
HTTP POST with Typed Parameter Transformation
The Messages.create() method accepts keyword arguments that match the MessageCreateParams TypedDict shape. Before the request leaves the process, the SDK applies maybe_transform() to convert the Python-typed parameter dict into its JSON-wire-format equivalent. This transformation:
- Strips fields set to the sentinel
omitvalue (the SDK's representation of "not provided") - Materializes
Iterabletypes into concrete lists - Recursively transforms nested TypedDicts
The resulting dict is passed as the JSON body of an HTTP POST to /v1/messages.
Non-Streaming vs Streaming Dispatch
The stream parameter controls the return type through Python's @overload mechanism:
stream=False(default) -- The SDK sends a standard POST, waits for the complete response, and parses it into aMessagePydantic model.stream=True-- The SDK sends the same POST but with"stream": truein the JSON body and wraps the response in aStream[RawMessageStreamEvent]object that yields server-sent events incrementally.
The type system enforces this at the caller's site: the three @overload signatures ensure that stream=Literal[False] returns Message, stream=Literal[True] returns Stream[RawMessageStreamEvent], and stream=bool returns the union.
Automatic Timeout Adjustment
For non-streaming requests, the SDK implements intelligent timeout scaling. When the user has not provided an explicit timeout and the client is using the default timeout (10 minutes), the SDK calls _calculate_nonstreaming_timeout() with the requested max_tokens and a model-specific token rate from MODEL_NONSTREAMING_TOKENS. This accounts for the fact that:
- Non-streaming requests must complete fully before any data is returned
- Larger
max_tokensvalues require longer wall-clock time - Certain models (e.g.,
claude-opus-4-20250514) have known token generation rates
This ensures that long-generation requests do not spuriously time out while still maintaining tight timeouts for small requests.
Automatic Retry with Exponential Back-off
The SDK inherits retry behavior from the base client. Transient errors (HTTP 429 rate limits, 5xx server errors, connection failures) trigger automatic retries up to max_retries times (default 2). The retry delay follows exponential back-off:
- Initial delay: 0.5 seconds
- Maximum delay: 8.0 seconds
- Jitter is applied to prevent thundering herd
The retry count and delay parameters are configured at the client level and can be overridden per-request through extra_headers or by using client.with_options(max_retries=N).
Deprecated Model Warnings
The create() method checks the requested model against a DEPRECATED_MODELS dictionary. If the model is scheduled for end-of-life, a DeprecationWarning is emitted with the deprecation date and a link to migration documentation. This provides a proactive migration signal without breaking existing code.
Design Constraints
- The
@required_argsdecorator enforces at runtime thatmax_tokens,messages, andmodelare always provided, complementing the static type checking. - The endpoint is always
/v1/messages-- there is no per-model endpoint routing. - The
cast_to=Messageparameter instructs the base client to parse the JSON response into theMessagePydantic model. - When
stream=True, thestream_cls=Stream[RawMessageStreamEvent]parameter tells the base client which stream wrapper to use.