Implementation:BerriAI Litellm Retry Policy Handler

Knowledge Sources	Domains	Last Updated
litellm repository	LLM Resilience, Fault Tolerance	2026-02-15

Overview

Concrete tool for configuring exception-specific retry logic and executing fallback chains provided by LiteLLM, implemented across the router utilities modules.

Description

LiteLLM provides two complementary mechanisms for resilient LLM API consumption:

get_num_retries_from_retry_policy -- A pure function that inspects the exception type and returns the configured retry count from a RetryPolicy. It supports both global retry policies and model-group-specific override policies. The function checks exception types in order: AuthenticationError, Timeout, RateLimitError, ContentPolicyViolationError, and BadRequestError.

run_async_fallback -- An async function that iterates through a list of fallback model groups, attempting each one via the router's async_function_with_fallbacks. It tracks fallback depth to enforce maximum fallback hops, logs success and failure events for observability, skips the original failing model group, and adds fallback headers to successful responses.

get_fallback_model_group -- Resolves which fallback model groups apply for a given model group by checking: exact match, stripped model group match (for versioned model names), and wildcard (*) generic fallbacks.

Usage

These utilities are called internally by the Router during retry and fallback execution. They can also be imported directly for testing or custom routing logic.

Code Reference

Source Locations:

litellm/router_utils/get_retry_from_policy.py (lines 19-71)
litellm/router_utils/fallback_event_handlers.py (lines 45-161)

get_num_retries_from_retry_policy Signature:

def get_num_retries_from_retry_policy(
    exception: Exception,
    retry_policy: Optional[Union[RetryPolicy, dict]] = None,
    model_group: Optional[str] = None,
    model_group_retry_policy: Optional[Dict[str, RetryPolicy]] = None,
) -> Optional[int]:

run_async_fallback Signature:

async def run_async_fallback(
    *args: Tuple[Any],
    litellm_router: LitellmRouter,
    fallback_model_group: List[str],
    original_model_group: str,
    original_exception: Exception,
    max_fallbacks: int,
    fallback_depth: int,
    **kwargs,
) -> Any:

get_fallback_model_group Signature:

def get_fallback_model_group(
    fallbacks: List[Any], model_group: str
) -> Tuple[Optional[List[str]], Optional[int]]:

Import:

from litellm.router_utils.get_retry_from_policy import get_num_retries_from_retry_policy
from litellm.router_utils.fallback_event_handlers import run_async_fallback, get_fallback_model_group

I/O Contract

get_num_retries_from_retry_policy

Input Parameter	Type	Required	Description
exception	`Exception`	Yes	The exception instance to match against the retry policy
retry_policy	`Optional[Union[RetryPolicy, dict]]`	No	Global retry policy mapping exception types to retry counts
model_group	`Optional[str]`	No	The model group name; used to look up group-specific policy
model_group_retry_policy	`Optional[Dict[str, RetryPolicy]]`	No	Per-model-group retry policy overrides

Output	Type	Description
retry count	`Optional[int]`	Number of retries for this exception type, or `None` if no matching policy

run_async_fallback

Input Parameter	Type	Required	Description
litellm_router	`Router`	Yes	The router instance managing deployments
fallback_model_group	`List[str]`	Yes	Ordered list of fallback model group names to try
original_model_group	`str`	Yes	The model group that originally failed
original_exception	`Exception`	Yes	The exception from the original failed call
max_fallbacks	`int`	Yes	Maximum number of fallback hops allowed
fallback_depth	`int`	Yes	Current depth in the fallback chain

Output	Type	Description
response	`Any`	The successful response from a fallback model group
(raises)	`Exception`	The most recent exception if all fallbacks fail, or the original exception if max depth reached

Usage Examples

Configuring a retry policy on the Router:

from litellm import Router
from litellm.types.router import RetryPolicy

router = Router(
    model_list=model_list,
    retry_policy=RetryPolicy(
        RateLimitErrorRetries=5,
        TimeoutErrorRetries=3,
        AuthenticationErrorRetries=0,
        ContentPolicyViolationErrorRetries=0,
        InternalServerErrorRetries=2,
    ),
    num_retries=2,  # default for exception types not in the policy
)

Per-model-group retry policies:

router = Router(
    model_list=model_list,
    model_group_retry_policy={
        "gpt-4": RetryPolicy(RateLimitErrorRetries=10, TimeoutErrorRetries=5),
        "gpt-3.5-turbo": RetryPolicy(RateLimitErrorRetries=3),
    },
)

Setting up fallback chains:

router = Router(
    model_list=model_list,
    fallbacks=[
        {"gpt-4": ["gpt-3.5-turbo", "claude-3-haiku"]},
    ],
    default_fallbacks=["gpt-3.5-turbo"],  # wildcard fallback for any model group
    context_window_fallbacks=[
        {"gpt-3.5-turbo": ["gpt-3.5-turbo-16k"]},
    ],
    max_fallbacks=5,  # maximum fallback depth
)

Directly using the retry policy resolver:

from litellm.router_utils.get_retry_from_policy import get_num_retries_from_retry_policy
from litellm.types.router import RetryPolicy
from litellm.exceptions import RateLimitError

policy = RetryPolicy(RateLimitErrorRetries=5, TimeoutErrorRetries=3)
exc = RateLimitError(message="Rate limit exceeded", model="gpt-4", llm_provider="openai")

retries = get_num_retries_from_retry_policy(exception=exc, retry_policy=policy)
print(retries)  # 5

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment