Implementation:BerriAI Litellm Retry Policy Handler
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| litellm repository | LLM Resilience, Fault Tolerance | 2026-02-15 |
Overview
Concrete tool for configuring exception-specific retry logic and executing fallback chains provided by LiteLLM, implemented across the router utilities modules.
Description
LiteLLM provides two complementary mechanisms for resilient LLM API consumption:
get_num_retries_from_retry_policy-- A pure function that inspects the exception type and returns the configured retry count from aRetryPolicy. It supports both global retry policies and model-group-specific override policies. The function checks exception types in order:AuthenticationError,Timeout,RateLimitError,ContentPolicyViolationError, andBadRequestError.
run_async_fallback-- An async function that iterates through a list of fallback model groups, attempting each one via the router'sasync_function_with_fallbacks. It tracks fallback depth to enforce maximum fallback hops, logs success and failure events for observability, skips the original failing model group, and adds fallback headers to successful responses.
get_fallback_model_group-- Resolves which fallback model groups apply for a given model group by checking: exact match, stripped model group match (for versioned model names), and wildcard (*) generic fallbacks.
Usage
These utilities are called internally by the Router during retry and fallback execution. They can also be imported directly for testing or custom routing logic.
Code Reference
Source Locations:
litellm/router_utils/get_retry_from_policy.py(lines 19-71)litellm/router_utils/fallback_event_handlers.py(lines 45-161)
get_num_retries_from_retry_policy Signature:
def get_num_retries_from_retry_policy(
exception: Exception,
retry_policy: Optional[Union[RetryPolicy, dict]] = None,
model_group: Optional[str] = None,
model_group_retry_policy: Optional[Dict[str, RetryPolicy]] = None,
) -> Optional[int]:
run_async_fallback Signature:
async def run_async_fallback(
*args: Tuple[Any],
litellm_router: LitellmRouter,
fallback_model_group: List[str],
original_model_group: str,
original_exception: Exception,
max_fallbacks: int,
fallback_depth: int,
**kwargs,
) -> Any:
get_fallback_model_group Signature:
def get_fallback_model_group(
fallbacks: List[Any], model_group: str
) -> Tuple[Optional[List[str]], Optional[int]]:
Import:
from litellm.router_utils.get_retry_from_policy import get_num_retries_from_retry_policy
from litellm.router_utils.fallback_event_handlers import run_async_fallback, get_fallback_model_group
I/O Contract
get_num_retries_from_retry_policy
| Input Parameter | Type | Required | Description |
|---|---|---|---|
| exception | Exception |
Yes | The exception instance to match against the retry policy |
| retry_policy | Optional[Union[RetryPolicy, dict]] |
No | Global retry policy mapping exception types to retry counts |
| model_group | Optional[str] |
No | The model group name; used to look up group-specific policy |
| model_group_retry_policy | Optional[Dict[str, RetryPolicy]] |
No | Per-model-group retry policy overrides |
| Output | Type | Description |
|---|---|---|
| retry count | Optional[int] |
Number of retries for this exception type, or None if no matching policy
|
run_async_fallback
| Input Parameter | Type | Required | Description |
|---|---|---|---|
| litellm_router | Router |
Yes | The router instance managing deployments |
| fallback_model_group | List[str] |
Yes | Ordered list of fallback model group names to try |
| original_model_group | str |
Yes | The model group that originally failed |
| original_exception | Exception |
Yes | The exception from the original failed call |
| max_fallbacks | int |
Yes | Maximum number of fallback hops allowed |
| fallback_depth | int |
Yes | Current depth in the fallback chain |
| Output | Type | Description |
|---|---|---|
| response | Any |
The successful response from a fallback model group |
| (raises) | Exception |
The most recent exception if all fallbacks fail, or the original exception if max depth reached |
Usage Examples
Configuring a retry policy on the Router:
from litellm import Router
from litellm.types.router import RetryPolicy
router = Router(
model_list=model_list,
retry_policy=RetryPolicy(
RateLimitErrorRetries=5,
TimeoutErrorRetries=3,
AuthenticationErrorRetries=0,
ContentPolicyViolationErrorRetries=0,
InternalServerErrorRetries=2,
),
num_retries=2, # default for exception types not in the policy
)
Per-model-group retry policies:
router = Router(
model_list=model_list,
model_group_retry_policy={
"gpt-4": RetryPolicy(RateLimitErrorRetries=10, TimeoutErrorRetries=5),
"gpt-3.5-turbo": RetryPolicy(RateLimitErrorRetries=3),
},
)
Setting up fallback chains:
router = Router(
model_list=model_list,
fallbacks=[
{"gpt-4": ["gpt-3.5-turbo", "claude-3-haiku"]},
],
default_fallbacks=["gpt-3.5-turbo"], # wildcard fallback for any model group
context_window_fallbacks=[
{"gpt-3.5-turbo": ["gpt-3.5-turbo-16k"]},
],
max_fallbacks=5, # maximum fallback depth
)
Directly using the retry policy resolver:
from litellm.router_utils.get_retry_from_policy import get_num_retries_from_retry_policy
from litellm.types.router import RetryPolicy
from litellm.exceptions import RateLimitError
policy = RetryPolicy(RateLimitErrorRetries=5, TimeoutErrorRetries=3)
exc = RateLimitError(message="Rate limit exceeded", model="gpt-4", llm_provider="openai")
retries = get_num_retries_from_retry_policy(exception=exc, retry_policy=policy)
print(retries) # 5