Principle:BerriAI Litellm Retry And Fallback

Knowledge Sources	Domains	Last Updated
litellm/router_utils/get_retry_from_policy.py, litellm/router_utils/fallback_event_handlers.py	LLM Resilience, Fault Tolerance	2026-02-15

Overview

Retry and fallback configuration defines how a system automatically retries failed LLM requests with exception-specific policies and falls back to alternative model groups when retries are exhausted.

Description

LLM API calls can fail for many reasons: rate limiting, authentication errors, timeouts, content policy violations, and internal server errors. A robust gateway must handle each failure type differently:

Retry policies map each exception type to a specific retry count. For example, rate limit errors may warrant 5 retries (with backoff), while authentication errors should not be retried at all. Retry policies can be set globally or per model group.
Fallback chains define ordered lists of alternative model groups to try when all retries for the primary model group are exhausted. There are three categories of fallback:
- Model-specific fallbacks -- e.g., if gpt-4 fails, try gpt-3.5-turbo.
- Context-window fallbacks -- triggered when the prompt exceeds the model's context window.
- Content-policy fallbacks -- triggered when the response is blocked by content filters.
- Default (wildcard) fallbacks -- apply to any model group that does not have a specific fallback defined.
Fallback depth limiting prevents infinite fallback loops by enforcing a maximum number of fallback hops.

Together, these mechanisms provide layered resilience: first retry within the same model group, then fall back across groups, with configurable limits at each stage.

Usage

Use retry and fallback configuration when:

Different exception types require different retry behavior (e.g., retry rate limits aggressively, do not retry auth errors).
You want graceful degradation from a preferred model to a cheaper or more available alternative.
You need to enforce a maximum fallback depth to prevent cascading failures.
Different model groups require different retry policies (e.g., more retries for production-critical models).

Theoretical Basis

The retry and fallback pattern combines exception-discriminated retry with chain of responsibility for fallbacks.

Pseudocode for retry policy resolution:

FUNCTION get_num_retries(exception, retry_policy, model_group, model_group_retry_policy):
    // Step 1: Check for model-group-specific retry policy
    IF model_group_retry_policy HAS model_group:
        retry_policy = model_group_retry_policy[model_group]

    IF retry_policy IS None:
        RETURN None  // use default retry count

    // Step 2: Match exception type to retry count
    IF exception IS AuthenticationError AND policy.AuthenticationErrorRetries IS SET:
        RETURN policy.AuthenticationErrorRetries
    IF exception IS Timeout AND policy.TimeoutErrorRetries IS SET:
        RETURN policy.TimeoutErrorRetries
    IF exception IS RateLimitError AND policy.RateLimitErrorRetries IS SET:
        RETURN policy.RateLimitErrorRetries
    IF exception IS ContentPolicyViolationError AND policy.ContentPolicyViolationErrorRetries IS SET:
        RETURN policy.ContentPolicyViolationErrorRetries
    IF exception IS BadRequestError AND policy.BadRequestErrorRetries IS SET:
        RETURN policy.BadRequestErrorRetries

    RETURN None  // no specific policy for this exception type

Pseudocode for fallback execution:

FUNCTION run_fallback(fallback_model_groups, original_model, original_exception,
                      max_fallbacks, current_depth):
    // Base case: stop if fallback depth exceeded
    IF current_depth >= max_fallbacks:
        RAISE original_exception

    most_recent_error = original_exception

    FOR EACH fallback_model IN fallback_model_groups:
        IF fallback_model == original_model:
            CONTINUE  // skip the model that already failed

        TRY:
            log_retry(original_exception)
            response = call_with_fallbacks(model=fallback_model, depth=current_depth + 1)
            log_success_fallback_event(original_model, fallback_model)
            RETURN response
        CATCH exception:
            most_recent_error = exception
            log_failure_fallback_event(original_model, fallback_model)

    RAISE most_recent_error  // all fallbacks exhausted

The key design insight is the two-tier approach: retries happen within a model group (same deployment pool), while fallbacks happen across model groups (entirely different deployment pool). This prevents a single failing provider from consuming all retry budget.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment