Implementation:BerriAI Litellm Fallback Utils

Attribute	Value
Sources	litellm/litellm_core_utils/fallback_utils.py
Domains	Reliability, Fallback Logic, Completion
Last Updated	2026-02-15 16:00 GMT

Overview

Provides completion-with-fallbacks logic that automatically retries with alternative models when the primary model fails.

Description

This module implements a fallback chain for LLM completions. It exposes two functions:

async_completion_with_fallbacks -- An async function that attempts litellm.acompletion with the primary model, then sequentially tries each fallback model until one succeeds. Fallbacks can be plain model name strings or dictionaries containing model-specific overrides (e.g., custom API keys or parameters). Internal parameters are filtered out before calling the provider API. All attempts share a single litellm_call_id for tracing.
completion_with_fallbacks -- A synchronous wrapper that delegates to the async version via run_async_function.

If all models fail, the function raises an Exception with the most recent error message and a suggestion to enable verbose logging.

Usage

Import these functions when you need automatic model fallback behavior without using the full Router system. Useful for simple scripts and CLI tools that want resilience across multiple LLM providers.

Code Reference

Source Location

litellm/litellm_core_utils/fallback_utils.py (77 lines)

Signature

async def async_completion_with_fallbacks(**kwargs) -> ModelResponse

def completion_with_fallbacks(**kwargs) -> ModelResponse

Import

from litellm.litellm_core_utils.fallback_utils import (
    async_completion_with_fallbacks,
    completion_with_fallbacks,
)

I/O Contract

`async_completion_with_fallbacks`

Direction	Name	Type	Description
Input	model	`str` (via kwargs)	Primary model name
Input	fallbacks	`List[Union[str, dict]]` (via nested kwargs)	List of fallback models or model config dicts
Input	**kwargs	keyword	All other completion parameters (messages, temperature, etc.)
Output	return	`ModelResponse`	The first successful completion response
Output	raises	`Exception`	If all models (primary + fallbacks) fail

`completion_with_fallbacks`

Direction	Name	Type	Description
Input	**kwargs	keyword	Same as `async_completion_with_fallbacks`
Output	return	`ModelResponse`	The first successful completion response

Usage Examples

import litellm
from litellm.litellm_core_utils.fallback_utils import completion_with_fallbacks

# Synchronous usage with string fallbacks
response = completion_with_fallbacks(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    kwargs={"fallbacks": ["gpt-3.5-turbo", "claude-3-haiku-20240307"]},
)

# Async usage with dict fallbacks (custom config per fallback)
import asyncio
from litellm.litellm_core_utils.fallback_utils import async_completion_with_fallbacks

async def main():
    response = await async_completion_with_fallbacks(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}],
        kwargs={
            "fallbacks": [
                {"model": "gpt-3.5-turbo", "temperature": 0.5},
                "claude-3-haiku-20240307",
            ]
        },
    )
    return response

result = asyncio.run(main())

Related Pages

BerriAI_Litellm_Asyncify -- provides run_async_function used by the sync wrapper
BerriAI_Litellm_Health_Check_Helpers -- health checks that also use fallback model lists
BerriAI_Litellm_LLM_Request_Utils -- provides pick_cheapest_chat_models_from_llm_provider used for fallback model selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment