Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:BerriAI Litellm Retry And Fallback

From Leeroopedia
Knowledge Sources Domains Last Updated
litellm/router_utils/get_retry_from_policy.py, litellm/router_utils/fallback_event_handlers.py LLM Resilience, Fault Tolerance 2026-02-15

Overview

Retry and fallback configuration defines how a system automatically retries failed LLM requests with exception-specific policies and falls back to alternative model groups when retries are exhausted.

Description

LLM API calls can fail for many reasons: rate limiting, authentication errors, timeouts, content policy violations, and internal server errors. A robust gateway must handle each failure type differently:

  • Retry policies map each exception type to a specific retry count. For example, rate limit errors may warrant 5 retries (with backoff), while authentication errors should not be retried at all. Retry policies can be set globally or per model group.
  • Fallback chains define ordered lists of alternative model groups to try when all retries for the primary model group are exhausted. There are three categories of fallback:
    • Model-specific fallbacks -- e.g., if gpt-4 fails, try gpt-3.5-turbo.
    • Context-window fallbacks -- triggered when the prompt exceeds the model's context window.
    • Content-policy fallbacks -- triggered when the response is blocked by content filters.
    • Default (wildcard) fallbacks -- apply to any model group that does not have a specific fallback defined.
  • Fallback depth limiting prevents infinite fallback loops by enforcing a maximum number of fallback hops.

Together, these mechanisms provide layered resilience: first retry within the same model group, then fall back across groups, with configurable limits at each stage.

Usage

Use retry and fallback configuration when:

  • Different exception types require different retry behavior (e.g., retry rate limits aggressively, do not retry auth errors).
  • You want graceful degradation from a preferred model to a cheaper or more available alternative.
  • You need to enforce a maximum fallback depth to prevent cascading failures.
  • Different model groups require different retry policies (e.g., more retries for production-critical models).

Theoretical Basis

The retry and fallback pattern combines exception-discriminated retry with chain of responsibility for fallbacks.

Pseudocode for retry policy resolution:

FUNCTION get_num_retries(exception, retry_policy, model_group, model_group_retry_policy):
    // Step 1: Check for model-group-specific retry policy
    IF model_group_retry_policy HAS model_group:
        retry_policy = model_group_retry_policy[model_group]

    IF retry_policy IS None:
        RETURN None  // use default retry count

    // Step 2: Match exception type to retry count
    IF exception IS AuthenticationError AND policy.AuthenticationErrorRetries IS SET:
        RETURN policy.AuthenticationErrorRetries
    IF exception IS Timeout AND policy.TimeoutErrorRetries IS SET:
        RETURN policy.TimeoutErrorRetries
    IF exception IS RateLimitError AND policy.RateLimitErrorRetries IS SET:
        RETURN policy.RateLimitErrorRetries
    IF exception IS ContentPolicyViolationError AND policy.ContentPolicyViolationErrorRetries IS SET:
        RETURN policy.ContentPolicyViolationErrorRetries
    IF exception IS BadRequestError AND policy.BadRequestErrorRetries IS SET:
        RETURN policy.BadRequestErrorRetries

    RETURN None  // no specific policy for this exception type

Pseudocode for fallback execution:

FUNCTION run_fallback(fallback_model_groups, original_model, original_exception,
                      max_fallbacks, current_depth):
    // Base case: stop if fallback depth exceeded
    IF current_depth >= max_fallbacks:
        RAISE original_exception

    most_recent_error = original_exception

    FOR EACH fallback_model IN fallback_model_groups:
        IF fallback_model == original_model:
            CONTINUE  // skip the model that already failed

        TRY:
            log_retry(original_exception)
            response = call_with_fallbacks(model=fallback_model, depth=current_depth + 1)
            log_success_fallback_event(original_model, fallback_model)
            RETURN response
        CATCH exception:
            most_recent_error = exception
            log_failure_fallback_event(original_model, fallback_model)

    RAISE most_recent_error  // all fallbacks exhausted

The key design insight is the two-tier approach: retries happen within a model group (same deployment pool), while fallbacks happen across model groups (entirely different deployment pool). This prevents a single failing provider from consuming all retry budget.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment