Implementation:BerriAI Litellm Router Budget Limiter

Knowledge Sources	Domains	Last Updated
litellm repository	Cost Management, Rate Limiting	2026-02-15

Overview

Concrete tool for enforcing budget and rate limits per provider, deployment, and tag provided by LiteLLM, implemented as the RouterBudgetLimiting class.

Description

RouterBudgetLimiting is a CustomLogger subclass that integrates into the LiteLLM callback system to track spend and filter deployments. It provides:

Spend tracking -- Registers as a LiteLLM callback to capture response costs on every successful completion. Spend increments are queued in-memory and periodically flushed to Redis via a background periodic_sync_in_memory_spend_with_redis task, avoiding per-request Redis latency.

Pre-call deployment filtering -- The async_filter_deployments method runs as an optional pre-call check. It batch-reads all relevant spend values from the dual cache, then filters out deployments whose provider, deployment, or tag spend has exceeded their configured budget limit.

Three-tier budget enforcement:
- Provider budgets -- Configured via provider_budget_config (e.g., {"openai": {"budget_limit": 100, "time_period": "1d"}}). Cache keys follow the pattern provider_spend:{provider}:{duration}.
- Deployment budgets -- Derived from max_budget and budget_duration fields in each deployment's litellm_params. Cache keys follow the pattern deployment_spend:{model_id}:{duration}.
- Tag budgets -- Configured separately, scoped by request tags. Cache keys follow the pattern tag_spend:{tag}:{duration}.

Prometheus integration -- Tracks remaining budget per provider via Prometheus metrics when a Prometheus logger is available.

The class uses a static should_init_router_budget_limiter method to determine at Router initialization time whether any budget configuration exists, and only instantiates the limiter when needed.

Usage

RouterBudgetLimiting is instantiated automatically by the Router when provider_budget_config is provided or when any deployment has max_budget set. It registers itself as a pre-call check via the optional_pre_call_checks mechanism.

Code Reference

Source Location: litellm/router_strategy/budget_limiter.py, lines 91-899

RouterBudgetLimiting.__init__ Signature:

class RouterBudgetLimiting(CustomLogger):
    def __init__(
        self,
        dual_cache: DualCache,
        provider_budget_config: Optional[dict],
        model_list: Optional[
            Union[List[DeploymentTypedDict], List[Dict[str, Any]]]
        ] = None,
    ):

async_filter_deployments Signature:

async def async_filter_deployments(
    self,
    model: str,
    healthy_deployments: List,
    messages: Optional[List[AllMessageValues]],
    request_kwargs: Optional[dict] = None,
    parent_otel_span: Optional[Span] = None,
) -> List[dict]:

Import:

from litellm.router_strategy.budget_limiter import RouterBudgetLimiting

I/O Contract

RouterBudgetLimiting.init

Input Parameter	Type	Required	Description
dual_cache	`DualCache`	Yes	Cache instance for storing spend counters (in-memory + Redis)
provider_budget_config	`Optional[dict]`	No	Provider-level budget configuration mapping provider names to budget limits and time periods
model_list	`Optional[List[Dict]]`	No	List of deployment configurations; used to extract per-deployment budget settings

async_filter_deployments

Input Parameter	Type	Required	Description
model	`str`	Yes	The requested model group name
healthy_deployments	`List[dict]`	Yes	List of currently healthy deployment dicts to be filtered
messages	`Optional[List[AllMessageValues]]`	Yes	The request messages (for context)
request_kwargs	`Optional[dict]`	No	Additional request keyword arguments; used to extract tags
parent_otel_span	`Optional[Span]`	No	OpenTelemetry span for distributed tracing

Output	Type	Description
filtered_deployments	`List[dict]`	Deployments that are within their budget limits
(raises)	`ValueError`	Raised with message "No deployments available - crossed budget" when all deployments exceed their budget

Usage Examples

Router with provider-level budget configuration:

from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "gpt-4",
            "litellm_params": {
                "model": "openai/gpt-4",
                "api_key": "sk-openai-xxx",
            },
        },
        {
            "model_name": "gpt-4",
            "litellm_params": {
                "model": "azure/gpt-4",
                "api_key": "sk-azure-xxx",
                "api_base": "https://myazure.openai.azure.com",
            },
        },
    ],
    provider_budget_config={
        "openai": {"budget_limit": 100.0, "time_period": "1d"},
        "azure": {"budget_limit": 200.0, "time_period": "1d"},
    },
)

Deployment-level budgets via litellm_params:

router = Router(
    model_list=[
        {
            "model_name": "gpt-4",
            "litellm_params": {
                "model": "openai/gpt-4",
                "api_key": "sk-xxx",
                "max_budget": 50.0,
                "budget_duration": "1d",
            },
        },
        {
            "model_name": "gpt-4",
            "litellm_params": {
                "model": "azure/gpt-4",
                "api_key": "sk-yyy",
                "max_budget": 100.0,
                "budget_duration": "7d",
            },
        },
    ],
)

YAML configuration for the proxy server:

router_settings:
  provider_budget_config:
    openai:
      budget_limit: 0.01
      time_period: 1d
    anthropic:
      budget_limit: 100.0
      time_period: 7d

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
      max_budget: 50.0
      budget_duration: 1d

Combined with routing strategy:

# Budget filtering works as a pre-call filter alongside any routing strategy
router = Router(
    model_list=model_list,
    routing_strategy="cost-based-routing",
    provider_budget_config={
        "openai": {"budget_limit": 100.0, "time_period": "1d"},
    },
    redis_url="redis://localhost:6379",  # enables cross-instance spend tracking
)

# The router will:
# 1. Filter out over-budget deployments (budget limiter)
# 2. Select the cheapest remaining deployment (cost-based routing)
response = await router.acompletion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment