Heuristic:Cohere ai Cohere python HTTP Retry Backoff Strategy

Knowledge Sources	Cohere Python SDK HTTP Retry-After RFC
Domains	Optimization, Reliability, HTTP
Last Updated	2026-02-15 14:00 GMT

Overview

Multi-level retry strategy with three fallback tiers: Retry-After header, X-RateLimit-Reset header with positive jitter, and exponential backoff with symmetric jitter (1s-60s).

Description

The SDK implements a sophisticated retry mechanism for transient failures. When a retryable HTTP response is received (5xx, 429, 408, 409), the retry delay is determined through a three-tier priority system. Two distinct jitter algorithms prevent thundering herd problems: positive jitter (0-20% increase) for rate-limit resets, and symmetric jitter (+/-10%) for exponential backoff.

Usage

This heuristic is relevant when:

Encountering rate limit errors (HTTP 429) from the Cohere API
Debugging retry behavior or unexpectedly long waits
Tuning the `max_retries` parameter on `RequestOptions`
Understanding why requests to overloaded endpoints eventually succeed

The Insight (Rule of Thumb)

Action: Let the SDK handle retries automatically; configure `max_retries` in `RequestOptions` if needed.
Value: Initial delay = 1.0s, max delay = 60.0s, jitter factor = 20%, retryable status codes = [429, 408, 409, 5xx].
Trade-off: Automatic retries improve reliability but can increase latency for failed requests. Each retry doubles the wait time up to the 60s cap.
Priority: `Retry-After` header > `X-RateLimit-Reset` header > Exponential backoff (2^n).

Reasoning

The three-tier approach respects server-provided signals first (Retry-After is an HTTP standard), then Cohere-specific rate limit headers, and only falls back to client-side estimation as a last resort. The two jitter strategies serve different purposes: positive jitter on rate-limit resets ensures clients do not all retry at exactly the reset time, while symmetric jitter on backoff provides better distribution around the calculated delay.

The retryable status code set is intentionally narrow: 429 (rate limit), 408 (request timeout), 409 (conflict/temporary), and all 5xx (server errors). Client errors like 400, 401, 404 are not retried because they indicate permanent failures.

Code Evidence

Retry constants from `core/http_client.py:20-22`:

INITIAL_RETRY_DELAY_SECONDS = 1.0
MAX_RETRY_DELAY_SECONDS = 60.0
JITTER_FACTOR = 0.2  # 20% random jitter

Retryable status code logic from `core/http_client.py:120-122`:

def _should_retry(response: httpx.Response) -> bool:
    retryable_400s = [429, 408, 409]
    return response.status_code >= 500 or response.status_code in retryable_400s

Three-tier retry timeout from `core/http_client.py:98-117`:

def _retry_timeout(response: httpx.Response, retries: int) -> float:
    # 1. Check Retry-After header first
    retry_after = _parse_retry_after(response.headers)
    if retry_after is not None and retry_after > 0:
        return min(retry_after, MAX_RETRY_DELAY_SECONDS)

    # 2. Check X-RateLimit-Reset header (with positive jitter)
    ratelimit_reset = _parse_x_ratelimit_reset(response.headers)
    if ratelimit_reset is not None:
        return _add_positive_jitter(min(ratelimit_reset, MAX_RETRY_DELAY_SECONDS))

    # 3. Fall back to exponential backoff (with symmetric jitter)
    backoff = min(INITIAL_RETRY_DELAY_SECONDS * pow(2.0, retries), MAX_RETRY_DELAY_SECONDS)
    return _add_symmetric_jitter(backoff)

Dual jitter algorithms from `core/http_client.py:66-75`:

def _add_positive_jitter(delay: float) -> float:
    """Add positive jitter (0-20%) to prevent thundering herd."""
    jitter_multiplier = 1 + random() * JITTER_FACTOR
    return delay * jitter_multiplier

def _add_symmetric_jitter(delay: float) -> float:
    """Add symmetric jitter (+-10%) for exponential backoff."""
    jitter_multiplier = 1 + (random() - 0.5) * JITTER_FACTOR
    return delay * jitter_multiplier

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment