Principle:BerriAI Litellm Retry And Fallback
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| litellm/router_utils/get_retry_from_policy.py, litellm/router_utils/fallback_event_handlers.py | LLM Resilience, Fault Tolerance | 2026-02-15 |
Overview
Retry and fallback configuration defines how a system automatically retries failed LLM requests with exception-specific policies and falls back to alternative model groups when retries are exhausted.
Description
LLM API calls can fail for many reasons: rate limiting, authentication errors, timeouts, content policy violations, and internal server errors. A robust gateway must handle each failure type differently:
- Retry policies map each exception type to a specific retry count. For example, rate limit errors may warrant 5 retries (with backoff), while authentication errors should not be retried at all. Retry policies can be set globally or per model group.
- Fallback chains define ordered lists of alternative model groups to try when all retries for the primary model group are exhausted. There are three categories of fallback:
- Model-specific fallbacks -- e.g., if
gpt-4fails, trygpt-3.5-turbo. - Context-window fallbacks -- triggered when the prompt exceeds the model's context window.
- Content-policy fallbacks -- triggered when the response is blocked by content filters.
- Default (wildcard) fallbacks -- apply to any model group that does not have a specific fallback defined.
- Model-specific fallbacks -- e.g., if
- Fallback depth limiting prevents infinite fallback loops by enforcing a maximum number of fallback hops.
Together, these mechanisms provide layered resilience: first retry within the same model group, then fall back across groups, with configurable limits at each stage.
Usage
Use retry and fallback configuration when:
- Different exception types require different retry behavior (e.g., retry rate limits aggressively, do not retry auth errors).
- You want graceful degradation from a preferred model to a cheaper or more available alternative.
- You need to enforce a maximum fallback depth to prevent cascading failures.
- Different model groups require different retry policies (e.g., more retries for production-critical models).
Theoretical Basis
The retry and fallback pattern combines exception-discriminated retry with chain of responsibility for fallbacks.
Pseudocode for retry policy resolution:
FUNCTION get_num_retries(exception, retry_policy, model_group, model_group_retry_policy):
// Step 1: Check for model-group-specific retry policy
IF model_group_retry_policy HAS model_group:
retry_policy = model_group_retry_policy[model_group]
IF retry_policy IS None:
RETURN None // use default retry count
// Step 2: Match exception type to retry count
IF exception IS AuthenticationError AND policy.AuthenticationErrorRetries IS SET:
RETURN policy.AuthenticationErrorRetries
IF exception IS Timeout AND policy.TimeoutErrorRetries IS SET:
RETURN policy.TimeoutErrorRetries
IF exception IS RateLimitError AND policy.RateLimitErrorRetries IS SET:
RETURN policy.RateLimitErrorRetries
IF exception IS ContentPolicyViolationError AND policy.ContentPolicyViolationErrorRetries IS SET:
RETURN policy.ContentPolicyViolationErrorRetries
IF exception IS BadRequestError AND policy.BadRequestErrorRetries IS SET:
RETURN policy.BadRequestErrorRetries
RETURN None // no specific policy for this exception type
Pseudocode for fallback execution:
FUNCTION run_fallback(fallback_model_groups, original_model, original_exception,
max_fallbacks, current_depth):
// Base case: stop if fallback depth exceeded
IF current_depth >= max_fallbacks:
RAISE original_exception
most_recent_error = original_exception
FOR EACH fallback_model IN fallback_model_groups:
IF fallback_model == original_model:
CONTINUE // skip the model that already failed
TRY:
log_retry(original_exception)
response = call_with_fallbacks(model=fallback_model, depth=current_depth + 1)
log_success_fallback_event(original_model, fallback_model)
RETURN response
CATCH exception:
most_recent_error = exception
log_failure_fallback_event(original_model, fallback_model)
RAISE most_recent_error // all fallbacks exhausted
The key design insight is the two-tier approach: retries happen within a model group (same deployment pool), while fallbacks happen across model groups (entirely different deployment pool). This prevents a single failing provider from consuming all retry budget.