Heuristic:Openai Openai python Retry Backoff Strategy
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Reliability |
| Last Updated | 2026-02-15 10:00 GMT |
Overview
Built-in exponential backoff retry strategy with jitter for handling rate limits (429), server errors (5xx), and timeouts (408/409), defaulting to 2 retries with 0.5s initial delay capped at 8s.
Description
The OpenAI Python SDK includes automatic retry logic for transient failures. When a request fails with certain status codes, the SDK waits using exponential backoff with jitter before retrying. The server can also explicitly control retry behavior via the `x-should-retry` header. The SDK parses `retry-after` and `retry-after-ms` headers to respect server-requested delays, but caps accepted delays at 60 seconds — beyond that, it falls back to its own exponential backoff calculation.
Usage
This heuristic applies automatically to all API calls made through the SDK. Understand it when you need to:
- Tune retry behavior — adjust `max_retries` on the client or per-request
- Debug rate limiting — understand why requests are retried or not
- Handle long waits — know that retry-after > 60s triggers fallback backoff
- Ensure idempotency — understand that non-GET requests get automatic idempotency keys
The Insight (Rule of Thumb)
- Action: Configure `max_retries` on the client constructor or per-request via `options`.
- Value: Default is `2` retries. Initial delay is `0.5s`, max delay is `8.0s`. Jitter factor is 0.75-1.0x.
- Trade-off: More retries increase resilience but add latency. Setting `max_retries=0` disables retries entirely.
- Retried status codes: 408 (Request Timeout), 409 (Lock Timeout), 429 (Rate Limit), 500+ (Server Errors).
- Server override: The `x-should-retry: true/false` header from the server takes precedence over status code logic.
- Retry-after cap: If server requests > 60 seconds delay, SDK ignores it and uses its own backoff formula.
- Idempotency: Non-GET retries automatically include an idempotency key (`stainless-python-retry-{uuid}`).
Reasoning
Transient failures (rate limits, server overloads) are common with cloud APIs. Exponential backoff with jitter prevents thundering herd problems where many clients retry simultaneously. The 60-second cap on server-requested delays prevents unreasonably long waits that might indicate a persistent issue rather than a transient one. The jitter factor (multiplying by 0.75 to 1.0) ensures that multiple concurrent clients don't all retry at exactly the same instant.
The backoff formula is: `min(0.5 * 2^retry_count, 8.0) * (1 - 0.25 * random())`.
For 2 retries, this produces approximate delays of:
- Retry 1: ~0.5s (range: 0.375s-0.5s)
- Retry 2: ~1.0s (range: 0.75s-1.0s)
Code evidence from `_base_client.py:749-771`:
def _calculate_retry_timeout(self, remaining_retries, options, response_headers=None):
max_retries = options.get_max_retries(self.max_retries)
# If the API asks us to wait a reasonable amount, do what it says.
retry_after = self._parse_retry_after_header(response_headers)
if retry_after is not None and 0 < retry_after <= 60:
return retry_after
nb_retries = min(max_retries - remaining_retries, 1000)
sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)
jitter = 1 - 0.25 * random()
timeout = sleep_seconds * jitter
return timeout if timeout >= 0 else 0
Retry decision logic from `_base_client.py:773-806`:
def _should_retry(self, response):
should_retry_header = response.headers.get("x-should-retry")
if should_retry_header == "true":
return True
if should_retry_header == "false":
return False
if response.status_code == 408: # Request Timeout
return True
if response.status_code == 409: # Lock Timeout
return True
if response.status_code == 429: # Rate Limit
return True
if response.status_code >= 500: # Internal Errors
return True
return False
Default constants from `_constants.py:10-14`:
DEFAULT_MAX_RETRIES = 2
INITIAL_RETRY_DELAY = 0.5
MAX_RETRY_DELAY = 8.0