Implementation:BerriAI Litellm Router Budget Limiter
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| litellm repository | Cost Management, Rate Limiting | 2026-02-15 |
Overview
Concrete tool for enforcing budget and rate limits per provider, deployment, and tag provided by LiteLLM, implemented as the RouterBudgetLimiting class.
Description
RouterBudgetLimiting is a CustomLogger subclass that integrates into the LiteLLM callback system to track spend and filter deployments. It provides:
- Spend tracking -- Registers as a LiteLLM callback to capture response costs on every successful completion. Spend increments are queued in-memory and periodically flushed to Redis via a background
periodic_sync_in_memory_spend_with_redistask, avoiding per-request Redis latency.
- Pre-call deployment filtering -- The
async_filter_deploymentsmethod runs as an optional pre-call check. It batch-reads all relevant spend values from the dual cache, then filters out deployments whose provider, deployment, or tag spend has exceeded their configured budget limit.
- Three-tier budget enforcement:
- Provider budgets -- Configured via
provider_budget_config(e.g.,{"openai": {"budget_limit": 100, "time_period": "1d"}}). Cache keys follow the patternprovider_spend:{provider}:{duration}. - Deployment budgets -- Derived from
max_budgetandbudget_durationfields in each deployment'slitellm_params. Cache keys follow the patterndeployment_spend:{model_id}:{duration}. - Tag budgets -- Configured separately, scoped by request tags. Cache keys follow the pattern
tag_spend:{tag}:{duration}.
- Provider budgets -- Configured via
- Prometheus integration -- Tracks remaining budget per provider via Prometheus metrics when a Prometheus logger is available.
The class uses a static should_init_router_budget_limiter method to determine at Router initialization time whether any budget configuration exists, and only instantiates the limiter when needed.
Usage
RouterBudgetLimiting is instantiated automatically by the Router when provider_budget_config is provided or when any deployment has max_budget set. It registers itself as a pre-call check via the optional_pre_call_checks mechanism.
Code Reference
Source Location: litellm/router_strategy/budget_limiter.py, lines 91-899
RouterBudgetLimiting.__init__ Signature:
class RouterBudgetLimiting(CustomLogger):
def __init__(
self,
dual_cache: DualCache,
provider_budget_config: Optional[dict],
model_list: Optional[
Union[List[DeploymentTypedDict], List[Dict[str, Any]]]
] = None,
):
async_filter_deployments Signature:
async def async_filter_deployments(
self,
model: str,
healthy_deployments: List,
messages: Optional[List[AllMessageValues]],
request_kwargs: Optional[dict] = None,
parent_otel_span: Optional[Span] = None,
) -> List[dict]:
Import:
from litellm.router_strategy.budget_limiter import RouterBudgetLimiting
I/O Contract
RouterBudgetLimiting.__init__
| Input Parameter | Type | Required | Description |
|---|---|---|---|
| dual_cache | DualCache |
Yes | Cache instance for storing spend counters (in-memory + Redis) |
| provider_budget_config | Optional[dict] |
No | Provider-level budget configuration mapping provider names to budget limits and time periods |
| model_list | Optional[List[Dict]] |
No | List of deployment configurations; used to extract per-deployment budget settings |
async_filter_deployments
| Input Parameter | Type | Required | Description |
|---|---|---|---|
| model | str |
Yes | The requested model group name |
| healthy_deployments | List[dict] |
Yes | List of currently healthy deployment dicts to be filtered |
| messages | Optional[List[AllMessageValues]] |
Yes | The request messages (for context) |
| request_kwargs | Optional[dict] |
No | Additional request keyword arguments; used to extract tags |
| parent_otel_span | Optional[Span] |
No | OpenTelemetry span for distributed tracing |
| Output | Type | Description |
|---|---|---|
| filtered_deployments | List[dict] |
Deployments that are within their budget limits |
| (raises) | ValueError |
Raised with message "No deployments available - crossed budget" when all deployments exceed their budget |
Usage Examples
Router with provider-level budget configuration:
from litellm import Router
router = Router(
model_list=[
{
"model_name": "gpt-4",
"litellm_params": {
"model": "openai/gpt-4",
"api_key": "sk-openai-xxx",
},
},
{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4",
"api_key": "sk-azure-xxx",
"api_base": "https://myazure.openai.azure.com",
},
},
],
provider_budget_config={
"openai": {"budget_limit": 100.0, "time_period": "1d"},
"azure": {"budget_limit": 200.0, "time_period": "1d"},
},
)
Deployment-level budgets via litellm_params:
router = Router(
model_list=[
{
"model_name": "gpt-4",
"litellm_params": {
"model": "openai/gpt-4",
"api_key": "sk-xxx",
"max_budget": 50.0,
"budget_duration": "1d",
},
},
{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4",
"api_key": "sk-yyy",
"max_budget": 100.0,
"budget_duration": "7d",
},
},
],
)
YAML configuration for the proxy server:
router_settings:
provider_budget_config:
openai:
budget_limit: 0.01
time_period: 1d
anthropic:
budget_limit: 100.0
time_period: 7d
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY
max_budget: 50.0
budget_duration: 1d
Combined with routing strategy:
# Budget filtering works as a pre-call filter alongside any routing strategy
router = Router(
model_list=model_list,
routing_strategy="cost-based-routing",
provider_budget_config={
"openai": {"budget_limit": 100.0, "time_period": "1d"},
},
redis_url="redis://localhost:6379", # enables cross-instance spend tracking
)
# The router will:
# 1. Filter out over-budget deployments (budget limiter)
# 2. Select the cheapest remaining deployment (cost-based routing)
response = await router.acompletion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
)