Heuristic:Openai Openai python Retry Backoff Strategy

Knowledge Sources	openai-python OpenAI Rate Limits
Domains	Optimization, Reliability
Last Updated	2026-02-15 10:00 GMT

Overview

Built-in exponential backoff retry strategy with jitter for handling rate limits (429), server errors (5xx), and timeouts (408/409), defaulting to 2 retries with 0.5s initial delay capped at 8s.

Description

The OpenAI Python SDK includes automatic retry logic for transient failures. When a request fails with certain status codes, the SDK waits using exponential backoff with jitter before retrying. The server can also explicitly control retry behavior via the `x-should-retry` header. The SDK parses `retry-after` and `retry-after-ms` headers to respect server-requested delays, but caps accepted delays at 60 seconds — beyond that, it falls back to its own exponential backoff calculation.

Usage

This heuristic applies automatically to all API calls made through the SDK. Understand it when you need to:

Tune retry behavior — adjust `max_retries` on the client or per-request
Debug rate limiting — understand why requests are retried or not
Handle long waits — know that retry-after > 60s triggers fallback backoff
Ensure idempotency — understand that non-GET requests get automatic idempotency keys

The Insight (Rule of Thumb)

Action: Configure `max_retries` on the client constructor or per-request via `options`.
Value: Default is `2` retries. Initial delay is `0.5s`, max delay is `8.0s`. Jitter factor is 0.75-1.0x.
Trade-off: More retries increase resilience but add latency. Setting `max_retries=0` disables retries entirely.
Retried status codes: 408 (Request Timeout), 409 (Lock Timeout), 429 (Rate Limit), 500+ (Server Errors).
Server override: The `x-should-retry: true/false` header from the server takes precedence over status code logic.
Retry-after cap: If server requests > 60 seconds delay, SDK ignores it and uses its own backoff formula.
Idempotency: Non-GET retries automatically include an idempotency key (`stainless-python-retry-{uuid}`).

Reasoning

Transient failures (rate limits, server overloads) are common with cloud APIs. Exponential backoff with jitter prevents thundering herd problems where many clients retry simultaneously. The 60-second cap on server-requested delays prevents unreasonably long waits that might indicate a persistent issue rather than a transient one. The jitter factor (multiplying by 0.75 to 1.0) ensures that multiple concurrent clients don't all retry at exactly the same instant.

The backoff formula is: `min(0.5 * 2^retry_count, 8.0) * (1 - 0.25 * random())`.

For 2 retries, this produces approximate delays of:

Retry 1: ~0.5s (range: 0.375s-0.5s)
Retry 2: ~1.0s (range: 0.75s-1.0s)

Code evidence from `_base_client.py:749-771`:

def _calculate_retry_timeout(self, remaining_retries, options, response_headers=None):
    max_retries = options.get_max_retries(self.max_retries)
    # If the API asks us to wait a reasonable amount, do what it says.
    retry_after = self._parse_retry_after_header(response_headers)
    if retry_after is not None and 0 < retry_after <= 60:
        return retry_after
    nb_retries = min(max_retries - remaining_retries, 1000)
    sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)
    jitter = 1 - 0.25 * random()
    timeout = sleep_seconds * jitter
    return timeout if timeout >= 0 else 0

Retry decision logic from `_base_client.py:773-806`:

def _should_retry(self, response):
    should_retry_header = response.headers.get("x-should-retry")
    if should_retry_header == "true":
        return True
    if should_retry_header == "false":
        return False
    if response.status_code == 408:  # Request Timeout
        return True
    if response.status_code == 409:  # Lock Timeout
        return True
    if response.status_code == 429:  # Rate Limit
        return True
    if response.status_code >= 500:  # Internal Errors
        return True
    return False

Default constants from `_constants.py:10-14`:

DEFAULT_MAX_RETRIES = 2
INITIAL_RETRY_DELAY = 0.5
MAX_RETRY_DELAY = 8.0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment