Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Anthropics Anthropic sdk python Retry Backoff Strategy

From Leeroopedia
Knowledge Sources
Domains API_Client, Optimization, Debugging
Last Updated 2026-02-15 12:00 GMT

Overview

The SDK implements automatic exponential backoff with jitter for retryable errors (408, 409, 429, 500+), defaulting to 2 retries with 0.5s initial delay capped at 8s.

Description

The Anthropic Python SDK automatically retries failed requests using an exponential backoff strategy with jitter. The retry logic respects server-provided retry-after and retry-after-ms headers when available (within a 0-60 second window), and falls back to exponential backoff otherwise. A custom x-should-retry response header can override the retry decision entirely. Jitter is applied as timeout * (1 - 0.25 * random()) to prevent thundering herd problems.

Usage

Apply this heuristic when configuring retry behavior for production deployments. The defaults (2 retries, 0.5s initial delay) are conservative. Increase max_retries for more resilient applications, or decrease it for latency-sensitive use cases. Monitor the x-stainless-retry-count request header to track retry frequency.

The Insight (Rule of Thumb)

  • Action: Pass max_retries to the client constructor to control retry count. Default is 2.
  • Value: Backoff formula is min(0.5 * 2^n, 8.0) * (1 - 0.25 * random()) seconds.
  • Trade-off: More retries increase resilience but add latency. With 2 retries, worst-case added latency is ~16 seconds.
  • Server-Controlled: If the API sends retry-after-ms or retry-after headers (0-60s range), the SDK respects them over its own calculation.

Reasoning

Rate limiting (429) and transient server errors (500+) are common in production API usage. Exponential backoff prevents overwhelming the server during recovery, while jitter distributes retry attempts across clients. The retry-after header respect ensures the SDK follows server-side rate limiting guidance precisely. The retry count is capped at 1000 internally to prevent integer overflow in the pow() calculation.

Retry # Min Delay (s) Max Delay (s)
1 0.375 0.500
2 0.750 1.000
3 1.500 2.000
4 3.000 4.000
5 6.000 8.000
6+ 6.000 8.000

Code Evidence

Retry conditions from _base_client.py:799-832:

def _should_retry(self, response: httpx.Response) -> bool:
    should_retry_header = response.headers.get("x-should-retry")
    if should_retry_header == "true":
        return True
    if should_retry_header == "false":
        return False

    if response.status_code == 408:  # Request Timeout
        return True
    if response.status_code == 409:  # Lock Timeout
        return True
    if response.status_code == 429:  # Rate Limit
        return True
    if response.status_code >= 500:  # Internal Server Error
        return True

    return False

Backoff calculation from _base_client.py:775-797:

def _calculate_retry_timeout(self, remaining_retries, options, response_headers=None):
    max_retries = options.get_max_retries(self.max_retries)
    retry_after = self._parse_retry_after_header(response_headers)
    if retry_after is not None and 0 < retry_after <= 60:
        return retry_after

    nb_retries = min(max_retries - remaining_retries, 1000)
    sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)
    jitter = 1 - 0.25 * random()
    timeout = sleep_seconds * jitter
    return timeout if timeout >= 0 else 0

Default constants from _constants.py:10-14:

DEFAULT_MAX_RETRIES = 2
INITIAL_RETRY_DELAY = 0.5
MAX_RETRY_DELAY = 8.0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment