Principle:BerriAI Litellm Deployment Definition
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| litellm/types/router.py | LLM Load Balancing, API Gateway Configuration | 2026-02-15 |
Overview
A deployment definition is a configuration unit that maps a logical model name to a specific provider endpoint, encapsulating all connection parameters required to route requests to that endpoint.
Description
In any multi-provider LLM gateway, there is a fundamental need to decouple the logical model name that callers use from the physical endpoint that actually serves the request. A deployment definition solves this by bundling three pieces of information into a single configuration object:
- Model Name -- The logical alias (e.g.,
gpt-3.5-turbo) that callers use to reference a capability rather than a specific backend. - LLM Parameters -- The concrete connection details: which provider model to call, API keys, base URLs, API versions, timeouts, retry limits, throughput caps (TPM/RPM), and provider-specific credentials (e.g., AWS region, Vertex project).
- Model Info -- Metadata about the deployment such as a unique identifier, custom pricing overrides, and supported feature flags.
Multiple deployments can share the same logical model name, enabling the router to load-balance across them. Each deployment is self-contained: it knows how to reach exactly one provider endpoint with the correct credentials and configuration.
Usage
Use deployment definitions when:
- You need to expose a single model name that fans out to multiple provider endpoints (e.g., two Azure OpenAI deployments in different regions both serving
gpt-4). - You want to attach per-endpoint configuration such as rate limits, budgets, timeouts, or custom pricing.
- You are building a router or proxy that must translate logical model requests into concrete provider API calls.
Theoretical Basis
The deployment definition pattern follows the Service Abstraction principle from service-oriented architecture. The caller interacts with a stable interface (the model name), while the system resolves that name to one of several concrete backends.
Pseudocode:
STRUCTURE DeploymentParams:
model: string // provider-specific model identifier, e.g. "azure/gpt-4-east"
api_key: string (optional)
api_base: string (optional)
timeout: float (optional)
max_retries: int (optional)
tpm: int (optional) // tokens-per-minute capacity
rpm: int (optional) // requests-per-minute capacity
max_budget: float (optional)
budget_duration: string (optional)
...provider-specific fields...
STRUCTURE ModelInfo:
id: string // unique deployment identifier (auto-generated UUID)
input_cost_per_token: float (optional)
output_cost_per_token: float (optional)
STRUCTURE Deployment:
model_name: string // logical name callers use
llm_params: DeploymentParams
model_info: ModelInfo // defaults created if not provided
FUNCTION create_deployment(name, params, info=None):
IF info IS None:
info = ModelInfo() // generate default metadata
// Propagate any custom pricing from params into info
FOR EACH pricing_field IN [input_cost_per_token, output_cost_per_token, ...]:
IF params HAS pricing_field:
info[pricing_field] = params[pricing_field]
RETURN Deployment(model_name=name, llm_params=params, model_info=info)
The key insight is that deployment definitions serve as the unit of routing: the router selects among deployments, not among raw API endpoints. This makes it possible to attach routing metadata (capacity, cost, health status) to each deployment independently.