Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Anthropics Anthropic sdk python API Request Execution

From Leeroopedia
Knowledge Sources
Domains API_Client, LLM
Last Updated 2026-02-15 00:00 GMT

Overview

The API Request Execution principle describes how the Anthropic Python SDK dispatches HTTP POST requests to the Messages API endpoint, handling parameter transformation, timeout calculation, response parsing, streaming dispatch, and automatic retry with exponential back-off. The Messages.create() method is the central point where typed parameters become a network request and a parsed response.

Theoretical Basis

HTTP POST with Typed Parameter Transformation

The Messages.create() method accepts keyword arguments that match the MessageCreateParams TypedDict shape. Before the request leaves the process, the SDK applies maybe_transform() to convert the Python-typed parameter dict into its JSON-wire-format equivalent. This transformation:

  • Strips fields set to the sentinel omit value (the SDK's representation of "not provided")
  • Materializes Iterable types into concrete lists
  • Recursively transforms nested TypedDicts

The resulting dict is passed as the JSON body of an HTTP POST to /v1/messages.

Non-Streaming vs Streaming Dispatch

The stream parameter controls the return type through Python's @overload mechanism:

  • stream=False (default) -- The SDK sends a standard POST, waits for the complete response, and parses it into a Message Pydantic model.
  • stream=True -- The SDK sends the same POST but with "stream": true in the JSON body and wraps the response in a Stream[RawMessageStreamEvent] object that yields server-sent events incrementally.

The type system enforces this at the caller's site: the three @overload signatures ensure that stream=Literal[False] returns Message, stream=Literal[True] returns Stream[RawMessageStreamEvent], and stream=bool returns the union.

Automatic Timeout Adjustment

For non-streaming requests, the SDK implements intelligent timeout scaling. When the user has not provided an explicit timeout and the client is using the default timeout (10 minutes), the SDK calls _calculate_nonstreaming_timeout() with the requested max_tokens and a model-specific token rate from MODEL_NONSTREAMING_TOKENS. This accounts for the fact that:

  • Non-streaming requests must complete fully before any data is returned
  • Larger max_tokens values require longer wall-clock time
  • Certain models (e.g., claude-opus-4-20250514) have known token generation rates

This ensures that long-generation requests do not spuriously time out while still maintaining tight timeouts for small requests.

Automatic Retry with Exponential Back-off

The SDK inherits retry behavior from the base client. Transient errors (HTTP 429 rate limits, 5xx server errors, connection failures) trigger automatic retries up to max_retries times (default 2). The retry delay follows exponential back-off:

  • Initial delay: 0.5 seconds
  • Maximum delay: 8.0 seconds
  • Jitter is applied to prevent thundering herd

The retry count and delay parameters are configured at the client level and can be overridden per-request through extra_headers or by using client.with_options(max_retries=N).

Deprecated Model Warnings

The create() method checks the requested model against a DEPRECATED_MODELS dictionary. If the model is scheduled for end-of-life, a DeprecationWarning is emitted with the deprecation date and a link to migration documentation. This provides a proactive migration signal without breaking existing code.

Design Constraints

  • The @required_args decorator enforces at runtime that max_tokens, messages, and model are always provided, complementing the static type checking.
  • The endpoint is always /v1/messages -- there is no per-model endpoint routing.
  • The cast_to=Message parameter instructs the base client to parse the JSON response into the Message Pydantic model.
  • When stream=True, the stream_cls=Stream[RawMessageStreamEvent] parameter tells the base client which stream wrapper to use.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment