Principle:BerriAI Litellm Deployment Definition

Knowledge Sources	Domains	Last Updated
litellm/types/router.py	LLM Load Balancing, API Gateway Configuration	2026-02-15

Overview

A deployment definition is a configuration unit that maps a logical model name to a specific provider endpoint, encapsulating all connection parameters required to route requests to that endpoint.

Description

In any multi-provider LLM gateway, there is a fundamental need to decouple the logical model name that callers use from the physical endpoint that actually serves the request. A deployment definition solves this by bundling three pieces of information into a single configuration object:

Model Name -- The logical alias (e.g., gpt-3.5-turbo) that callers use to reference a capability rather than a specific backend.
LLM Parameters -- The concrete connection details: which provider model to call, API keys, base URLs, API versions, timeouts, retry limits, throughput caps (TPM/RPM), and provider-specific credentials (e.g., AWS region, Vertex project).
Model Info -- Metadata about the deployment such as a unique identifier, custom pricing overrides, and supported feature flags.

Multiple deployments can share the same logical model name, enabling the router to load-balance across them. Each deployment is self-contained: it knows how to reach exactly one provider endpoint with the correct credentials and configuration.

Usage

Use deployment definitions when:

You need to expose a single model name that fans out to multiple provider endpoints (e.g., two Azure OpenAI deployments in different regions both serving gpt-4).
You want to attach per-endpoint configuration such as rate limits, budgets, timeouts, or custom pricing.
You are building a router or proxy that must translate logical model requests into concrete provider API calls.

Theoretical Basis

The deployment definition pattern follows the Service Abstraction principle from service-oriented architecture. The caller interacts with a stable interface (the model name), while the system resolves that name to one of several concrete backends.

Pseudocode:

STRUCTURE DeploymentParams:
    model: string               // provider-specific model identifier, e.g. "azure/gpt-4-east"
    api_key: string (optional)
    api_base: string (optional)
    timeout: float (optional)
    max_retries: int (optional)
    tpm: int (optional)         // tokens-per-minute capacity
    rpm: int (optional)         // requests-per-minute capacity
    max_budget: float (optional)
    budget_duration: string (optional)
    ...provider-specific fields...

STRUCTURE ModelInfo:
    id: string                  // unique deployment identifier (auto-generated UUID)
    input_cost_per_token: float (optional)
    output_cost_per_token: float (optional)

STRUCTURE Deployment:
    model_name: string          // logical name callers use
    llm_params: DeploymentParams
    model_info: ModelInfo       // defaults created if not provided

FUNCTION create_deployment(name, params, info=None):
    IF info IS None:
        info = ModelInfo()      // generate default metadata
    // Propagate any custom pricing from params into info
    FOR EACH pricing_field IN [input_cost_per_token, output_cost_per_token, ...]:
        IF params HAS pricing_field:
            info[pricing_field] = params[pricing_field]
    RETURN Deployment(model_name=name, llm_params=params, model_info=info)

The key insight is that deployment definitions serve as the unit of routing: the router selects among deployments, not among raw API endpoints. This makes it possible to attach routing metadata (capacity, cost, health status) to each deployment independently.

Related Pages

Implementation:BerriAI_Litellm_Deployment_Types

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment