Heuristic:Groq Groq python Retry Backoff Strategy

Knowledge Sources	Groq Python SDK Groq SDK README
Domains	API_Client, Reliability
Last Updated	2026-02-15 17:00 GMT

Overview

Exponential backoff retry strategy with jitter for handling transient API errors, defaulting to 2 retries with 0.5s-8s delay range.

Description

The Groq SDK includes a built-in retry mechanism that automatically retries failed requests using exponential backoff with jitter. The retry logic handles specific HTTP status codes (408, 409, 429, 5xx) and also respects a custom x-should-retry response header from the server. The backoff formula is min(0.5 * 2^n, 8.0) * jitter where jitter is a random multiplier between 0.75 and 1.0. If the server provides a retry-after header with a value between 0-60 seconds, the SDK respects that instead.

Usage

Apply this heuristic when configuring the Groq client for production use. The default of 2 retries is suitable for most use cases. Increase max_retries for critical workflows that must succeed. Set to 0 for latency-sensitive applications where retries are unacceptable. Understand which errors are retried to properly handle non-retryable errors (400, 401, 403, 404, 422) in your application code.

The Insight (Rule of Thumb)

Action: Configure max_retries on the Groq client based on your reliability needs.
Value: Default is 2 retries. Use 0 for no retries, higher values for critical jobs. Can use math.inf for unlimited retries.
Trade-off: More retries increase reliability but add latency on failures. The exponential backoff (0.5s to 8s max) prevents thundering herd but delays recovery.
Retried status codes: 408 (Request Timeout), 409 (Conflict/Lock Timeout), 429 (Rate Limit), >= 500 (Server Errors).
NOT retried: 400, 401, 403, 404, 422 (client errors that won't resolve by retrying).
Server override: The x-should-retry: true/false header takes precedence over status code logic.
Per-request override: Use .with_options(max_retries=N) to override for individual requests.

Reasoning

Transient failures (rate limits, server overload, lock contention) are common with cloud APIs. Retrying with exponential backoff prevents overwhelming the server during outages while giving the system time to recover. The jitter (plus-or-minus 25%) prevents synchronized retry storms from multiple clients. The 60-second cap on retry-after values prevents indefinite blocking from malformed server responses. The retry count is capped at 1000 internally to prevent overflow in the exponential calculation.

Code Evidence

Default constants from src/groq/_constants.py:8-14:

# default timeout is 1 minute
DEFAULT_TIMEOUT = httpx.Timeout(timeout=60, connect=5.0)
DEFAULT_MAX_RETRIES = 2
DEFAULT_CONNECTION_LIMITS = httpx.Limits(max_connections=100, max_keepalive_connections=20)

INITIAL_RETRY_DELAY = 0.5
MAX_RETRY_DELAY = 8.0

Backoff calculation from src/groq/_base_client.py:724-746:

def _calculate_retry_timeout(self, remaining_retries, options, response_headers):
    max_retries = options.get_max_retries(self.max_retries)

    # If the API asks us to wait a certain amount of time (and it's a reasonable amount), just do what it says.
    retry_after = self._parse_retry_after_header(response_headers)
    if retry_after is not None and 0 < retry_after <= 60:
        return retry_after

    # Also cap retry count to 1000 to avoid any potential overflows with `pow`
    nb_retries = min(max_retries - remaining_retries, 1000)

    # Apply exponential backoff, but not more than the max.
    sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)

    # Apply some jitter, plus-or-minus half a second.
    jitter = 1 - 0.25 * random()
    timeout = sleep_seconds * jitter
    return timeout if timeout >= 0 else 0

Retry decision logic from src/groq/_base_client.py:748-781:

def _should_retry(self, response: httpx.Response) -> bool:
    # Note: this is not a standard header
    should_retry_header = response.headers.get("x-should-retry")

    # If the server explicitly says whether or not to retry, obey.
    if should_retry_header == "true":
        return True
    if should_retry_header == "false":
        return False

    # Retry on request timeouts.
    if response.status_code == 408:
        return True
    # Retry on lock timeouts.
    if response.status_code == 409:
        return True
    # Retry on rate limits.
    if response.status_code == 429:
        return True
    # Retry internal errors.
    if response.status_code >= 500:
        return True
    return False

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment