Heuristic:Cohere ai Cohere python HTTP Retry Backoff Strategy
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Reliability, HTTP |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
Multi-level retry strategy with three fallback tiers: Retry-After header, X-RateLimit-Reset header with positive jitter, and exponential backoff with symmetric jitter (1s-60s).
Description
The SDK implements a sophisticated retry mechanism for transient failures. When a retryable HTTP response is received (5xx, 429, 408, 409), the retry delay is determined through a three-tier priority system. Two distinct jitter algorithms prevent thundering herd problems: positive jitter (0-20% increase) for rate-limit resets, and symmetric jitter (+/-10%) for exponential backoff.
Usage
This heuristic is relevant when:
- Encountering rate limit errors (HTTP 429) from the Cohere API
- Debugging retry behavior or unexpectedly long waits
- Tuning the `max_retries` parameter on `RequestOptions`
- Understanding why requests to overloaded endpoints eventually succeed
The Insight (Rule of Thumb)
- Action: Let the SDK handle retries automatically; configure `max_retries` in `RequestOptions` if needed.
- Value: Initial delay = 1.0s, max delay = 60.0s, jitter factor = 20%, retryable status codes = [429, 408, 409, 5xx].
- Trade-off: Automatic retries improve reliability but can increase latency for failed requests. Each retry doubles the wait time up to the 60s cap.
- Priority: `Retry-After` header > `X-RateLimit-Reset` header > Exponential backoff (2^n).
Reasoning
The three-tier approach respects server-provided signals first (Retry-After is an HTTP standard), then Cohere-specific rate limit headers, and only falls back to client-side estimation as a last resort. The two jitter strategies serve different purposes: positive jitter on rate-limit resets ensures clients do not all retry at exactly the reset time, while symmetric jitter on backoff provides better distribution around the calculated delay.
The retryable status code set is intentionally narrow: 429 (rate limit), 408 (request timeout), 409 (conflict/temporary), and all 5xx (server errors). Client errors like 400, 401, 404 are not retried because they indicate permanent failures.
Code Evidence
Retry constants from `core/http_client.py:20-22`:
INITIAL_RETRY_DELAY_SECONDS = 1.0
MAX_RETRY_DELAY_SECONDS = 60.0
JITTER_FACTOR = 0.2 # 20% random jitter
Retryable status code logic from `core/http_client.py:120-122`:
def _should_retry(response: httpx.Response) -> bool:
retryable_400s = [429, 408, 409]
return response.status_code >= 500 or response.status_code in retryable_400s
Three-tier retry timeout from `core/http_client.py:98-117`:
def _retry_timeout(response: httpx.Response, retries: int) -> float:
# 1. Check Retry-After header first
retry_after = _parse_retry_after(response.headers)
if retry_after is not None and retry_after > 0:
return min(retry_after, MAX_RETRY_DELAY_SECONDS)
# 2. Check X-RateLimit-Reset header (with positive jitter)
ratelimit_reset = _parse_x_ratelimit_reset(response.headers)
if ratelimit_reset is not None:
return _add_positive_jitter(min(ratelimit_reset, MAX_RETRY_DELAY_SECONDS))
# 3. Fall back to exponential backoff (with symmetric jitter)
backoff = min(INITIAL_RETRY_DELAY_SECONDS * pow(2.0, retries), MAX_RETRY_DELAY_SECONDS)
return _add_symmetric_jitter(backoff)
Dual jitter algorithms from `core/http_client.py:66-75`:
def _add_positive_jitter(delay: float) -> float:
"""Add positive jitter (0-20%) to prevent thundering herd."""
jitter_multiplier = 1 + random() * JITTER_FACTOR
return delay * jitter_multiplier
def _add_symmetric_jitter(delay: float) -> float:
"""Add symmetric jitter (+-10%) for exponential backoff."""
jitter_multiplier = 1 + (random() - 0.5) * JITTER_FACTOR
return delay * jitter_multiplier