Heuristic:Googleapis Python genai API Retry Backoff Strategy
| Knowledge Sources | |
|---|---|
| Domains | Reliability, Infrastructure |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
The SDK uses exponential backoff with jitter for API retries, configured with 5 attempts, 1-60 second delays, and retries on HTTP 408/429/500/502/503/504 status codes.
Description
The Google GenAI SDK implements automatic retry logic for transient API errors using the tenacity library. The retry configuration is modeled after Google Cloud Storage best practices. By default, the client retries up to 4 times (5 total attempts including the initial call) with exponential backoff starting at 1 second and capping at 60 seconds. Jitter of 1 second is added to prevent thundering herd effects.
The retry strategy is configurable per-client or per-request via `HttpRetryOptions`. A separate retry configuration exists for the Interactions API client with more conservative defaults (2 retries, 0.5-8 second delays).
Usage
This heuristic is relevant whenever you make any API call through the SDK. It is built into the transport layer and activates automatically on retryable errors. Customize retry behavior when:
- Latency-sensitive applications: Reduce `attempts` to fail fast.
- Batch/offline workloads: Increase `max_delay` and `attempts` for more resilience.
- Rate-limited scenarios: The default handles 429 (Too Many Requests) automatically.
The Insight (Rule of Thumb)
- Action: The SDK automatically retries with exponential backoff. Override via `HttpRetryOptions` in `HttpOptions`.
- Value: Default: 5 attempts, 1.0s initial delay, 60.0s max delay, base 2 exponent, 1s jitter.
- Retryable Codes: 408 (Request Timeout), 429 (Too Many Requests), 500 (Internal Server Error), 502 (Bad Gateway), 503 (Service Unavailable), 504 (Gateway Timeout).
- Trade-off: More retries improve reliability but increase end-to-end latency for persistent errors. Jitter prevents synchronized retry storms from multiple clients.
- Two configurations: The main API client (5 attempts, 1-60s) is more aggressive than the Interactions client (2 retries, 0.5-8s).
Reasoning
The retry configuration is based on Google Cloud Storage best practices, which have been empirically validated across Google Cloud services:
- Exponential backoff (base 2): Delays of approximately 1, 2, 4, 8 seconds give the server time to recover while avoiding excessive wait times.
- Jitter (1 second): Randomizes retry timing to prevent multiple clients from retrying simultaneously after an outage (thundering herd).
- 60-second cap: Prevents unreasonably long waits while still accommodating temporary overload.
- Status code selection: Only truly transient errors are retried (server errors and rate limits). Client errors (4xx) other than 408/429 are not retried as they indicate a permanent issue with the request.
The Interactions API client uses more conservative defaults (2 retries, 0.5s-8s) and also respects the servers `Retry-After` header (up to 60 seconds) and a custom `x-should-retry` header for explicit server-side retry guidance.
Code Evidence
Default retry constants from `_api_client.py:452-468`:
# Default retry options.
# The config is based on https://cloud.google.com/storage/docs/retry-strategy.
# By default, the client will retry 4 times with approximately 1.0, 2.0, 4.0,
# 8.0 seconds between each attempt.
_RETRY_ATTEMPTS = 5 # including the initial call.
_RETRY_INITIAL_DELAY = 1.0 # seconds
_RETRY_MAX_DELAY = 60.0 # seconds
_RETRY_EXP_BASE = 2
_RETRY_JITTER = 1
_RETRY_HTTP_STATUS_CODES = (
408, # Request timeout.
429, # Too many requests.
500, # Internal server error.
502, # Bad gateway.
503, # Service unavailable.
504, # Gateway timeout
)
Tenacity retry setup from `_api_client.py:471-501`:
def retry_args(options: Optional[HttpRetryOptions]) -> _common.StringDict:
stop = tenacity.stop_after_attempt(options.attempts or _RETRY_ATTEMPTS)
retriable_codes = options.http_status_codes or _RETRY_HTTP_STATUS_CODES
retry = tenacity.retry_if_exception(
lambda e: isinstance(e, errors.APIError) and e.code in retriable_codes,
)
wait = tenacity.wait_exponential_jitter(
initial=options.initial_delay or _RETRY_INITIAL_DELAY,
max=options.max_delay or _RETRY_MAX_DELAY,
exp_base=options.exp_base or _RETRY_EXP_BASE,
jitter=options.jitter or _RETRY_JITTER,
)
return {'stop': stop, 'retry': retry, 'reraise': True, 'wait': wait}
Interactions client retry constants from `_interactions/_constants.py:25-29`:
DEFAULT_MAX_RETRIES = 2
INITIAL_RETRY_DELAY = 0.5
MAX_RETRY_DELAY = 8.0