Heuristic:Anthropics Anthropic sdk python Retry Backoff Strategy
| Knowledge Sources | |
|---|---|
| Domains | API_Client, Optimization, Debugging |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
The SDK implements automatic exponential backoff with jitter for retryable errors (408, 409, 429, 500+), defaulting to 2 retries with 0.5s initial delay capped at 8s.
Description
The Anthropic Python SDK automatically retries failed requests using an exponential backoff strategy with jitter. The retry logic respects server-provided retry-after and retry-after-ms headers when available (within a 0-60 second window), and falls back to exponential backoff otherwise. A custom x-should-retry response header can override the retry decision entirely. Jitter is applied as timeout * (1 - 0.25 * random()) to prevent thundering herd problems.
Usage
Apply this heuristic when configuring retry behavior for production deployments. The defaults (2 retries, 0.5s initial delay) are conservative. Increase max_retries for more resilient applications, or decrease it for latency-sensitive use cases. Monitor the x-stainless-retry-count request header to track retry frequency.
The Insight (Rule of Thumb)
- Action: Pass
max_retriesto the client constructor to control retry count. Default is 2. - Value: Backoff formula is
min(0.5 * 2^n, 8.0) * (1 - 0.25 * random())seconds. - Trade-off: More retries increase resilience but add latency. With 2 retries, worst-case added latency is ~16 seconds.
- Server-Controlled: If the API sends
retry-after-msorretry-afterheaders (0-60s range), the SDK respects them over its own calculation.
Reasoning
Rate limiting (429) and transient server errors (500+) are common in production API usage. Exponential backoff prevents overwhelming the server during recovery, while jitter distributes retry attempts across clients. The retry-after header respect ensures the SDK follows server-side rate limiting guidance precisely. The retry count is capped at 1000 internally to prevent integer overflow in the pow() calculation.
| Retry # | Min Delay (s) | Max Delay (s) |
|---|---|---|
| 1 | 0.375 | 0.500 |
| 2 | 0.750 | 1.000 |
| 3 | 1.500 | 2.000 |
| 4 | 3.000 | 4.000 |
| 5 | 6.000 | 8.000 |
| 6+ | 6.000 | 8.000 |
Code Evidence
Retry conditions from _base_client.py:799-832:
def _should_retry(self, response: httpx.Response) -> bool:
should_retry_header = response.headers.get("x-should-retry")
if should_retry_header == "true":
return True
if should_retry_header == "false":
return False
if response.status_code == 408: # Request Timeout
return True
if response.status_code == 409: # Lock Timeout
return True
if response.status_code == 429: # Rate Limit
return True
if response.status_code >= 500: # Internal Server Error
return True
return False
Backoff calculation from _base_client.py:775-797:
def _calculate_retry_timeout(self, remaining_retries, options, response_headers=None):
max_retries = options.get_max_retries(self.max_retries)
retry_after = self._parse_retry_after_header(response_headers)
if retry_after is not None and 0 < retry_after <= 60:
return retry_after
nb_retries = min(max_retries - remaining_retries, 1000)
sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)
jitter = 1 - 0.25 * random()
timeout = sleep_seconds * jitter
return timeout if timeout >= 0 else 0
Default constants from _constants.py:10-14:
DEFAULT_MAX_RETRIES = 2
INITIAL_RETRY_DELAY = 0.5
MAX_RETRY_DELAY = 8.0