Heuristic:Groq Groq python Retry Backoff Strategy
| Knowledge Sources | |
|---|---|
| Domains | API_Client, Reliability |
| Last Updated | 2026-02-15 17:00 GMT |
Overview
Exponential backoff retry strategy with jitter for handling transient API errors, defaulting to 2 retries with 0.5s-8s delay range.
Description
The Groq SDK includes a built-in retry mechanism that automatically retries failed requests using exponential backoff with jitter. The retry logic handles specific HTTP status codes (408, 409, 429, 5xx) and also respects a custom x-should-retry response header from the server. The backoff formula is min(0.5 * 2^n, 8.0) * jitter where jitter is a random multiplier between 0.75 and 1.0. If the server provides a retry-after header with a value between 0-60 seconds, the SDK respects that instead.
Usage
Apply this heuristic when configuring the Groq client for production use. The default of 2 retries is suitable for most use cases. Increase max_retries for critical workflows that must succeed. Set to 0 for latency-sensitive applications where retries are unacceptable. Understand which errors are retried to properly handle non-retryable errors (400, 401, 403, 404, 422) in your application code.
The Insight (Rule of Thumb)
- Action: Configure
max_retrieson the Groq client based on your reliability needs. - Value: Default is 2 retries. Use 0 for no retries, higher values for critical jobs. Can use
math.inffor unlimited retries. - Trade-off: More retries increase reliability but add latency on failures. The exponential backoff (0.5s to 8s max) prevents thundering herd but delays recovery.
- Retried status codes: 408 (Request Timeout), 409 (Conflict/Lock Timeout), 429 (Rate Limit), >= 500 (Server Errors).
- NOT retried: 400, 401, 403, 404, 422 (client errors that won't resolve by retrying).
- Server override: The
x-should-retry: true/falseheader takes precedence over status code logic. - Per-request override: Use
.with_options(max_retries=N)to override for individual requests.
Reasoning
Transient failures (rate limits, server overload, lock contention) are common with cloud APIs. Retrying with exponential backoff prevents overwhelming the server during outages while giving the system time to recover. The jitter (plus-or-minus 25%) prevents synchronized retry storms from multiple clients. The 60-second cap on retry-after values prevents indefinite blocking from malformed server responses. The retry count is capped at 1000 internally to prevent overflow in the exponential calculation.
Code Evidence
Default constants from src/groq/_constants.py:8-14:
# default timeout is 1 minute
DEFAULT_TIMEOUT = httpx.Timeout(timeout=60, connect=5.0)
DEFAULT_MAX_RETRIES = 2
DEFAULT_CONNECTION_LIMITS = httpx.Limits(max_connections=100, max_keepalive_connections=20)
INITIAL_RETRY_DELAY = 0.5
MAX_RETRY_DELAY = 8.0
Backoff calculation from src/groq/_base_client.py:724-746:
def _calculate_retry_timeout(self, remaining_retries, options, response_headers):
max_retries = options.get_max_retries(self.max_retries)
# If the API asks us to wait a certain amount of time (and it's a reasonable amount), just do what it says.
retry_after = self._parse_retry_after_header(response_headers)
if retry_after is not None and 0 < retry_after <= 60:
return retry_after
# Also cap retry count to 1000 to avoid any potential overflows with `pow`
nb_retries = min(max_retries - remaining_retries, 1000)
# Apply exponential backoff, but not more than the max.
sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)
# Apply some jitter, plus-or-minus half a second.
jitter = 1 - 0.25 * random()
timeout = sleep_seconds * jitter
return timeout if timeout >= 0 else 0
Retry decision logic from src/groq/_base_client.py:748-781:
def _should_retry(self, response: httpx.Response) -> bool:
# Note: this is not a standard header
should_retry_header = response.headers.get("x-should-retry")
# If the server explicitly says whether or not to retry, obey.
if should_retry_header == "true":
return True
if should_retry_header == "false":
return False
# Retry on request timeouts.
if response.status_code == 408:
return True
# Retry on lock timeouts.
if response.status_code == 409:
return True
# Retry on rate limits.
if response.status_code == 429:
return True
# Retry internal errors.
if response.status_code >= 500:
return True
return False