Principle:BerriAI Litellm Proxy Configuration
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| BerriAI/litellm repository | LLM Gateway, Configuration Management, Proxy Infrastructure | 2026-02-15 |
Overview
Declarative configuration of an LLM proxy gateway through structured YAML files defining model deployments, runtime settings, and feature toggles.
Description
Proxy configuration is the practice of defining the complete operational behavior of an LLM gateway server through a single, declarative configuration file rather than through imperative code changes or runtime API calls. This approach addresses the problem of managing complex, multi-provider LLM deployments where dozens of models from different providers must be unified behind a single API surface.
A proxy configuration file serves as the single source of truth for:
- Model deployments -- which LLM providers and models are available, including their API endpoints, credentials, and deployment-specific parameters.
- General settings -- server-level configuration such as master key, database URL, allowed origins, and authentication modes.
- LiteLLM settings -- library-level behavior toggles such as caching, fallback strategies, parameter dropping, and callback integrations.
- Router settings -- load balancing policies, retry logic, cooldown thresholds, and routing strategies across model groups.
- Environment variables -- secrets and configuration values injected from environment variables at config load time.
The configuration file is typically written in YAML format, which provides a human-readable, version-controllable, and auditable specification. When the proxy server starts, it parses this file, resolves environment variable references, initializes the model router with the declared deployments, and applies all settings to the runtime.
Usage
Use declarative proxy configuration when:
- Deploying an LLM gateway that must route requests across multiple providers (OpenAI, Anthropic, Azure, AWS Bedrock, etc.).
- Maintaining reproducible and auditable infrastructure-as-code for LLM routing.
- Enabling non-developer operators to manage model availability and settings without modifying application code.
- Supporting environment-specific deployments (development, staging, production) through config file variations.
- Centralizing API key management and credential injection via environment variables.
Theoretical Basis
Declarative proxy configuration follows the infrastructure-as-code paradigm, where the desired state of a system is described in a static document and a runtime engine converges the actual system state to match that specification.
The general structure of a proxy configuration can be described with the following pseudocode:
CONFIGURATION := {
environment_variables: MAP[name -> value_or_env_ref],
model_list: LIST[
{
model_name: STRING, -- public alias for the model group
litellm_params: {
model: STRING, -- provider-prefixed model identifier
api_key: STRING_OR_REF, -- credential (often env variable)
api_base: STRING, -- optional provider endpoint override
...provider_specific_params
},
model_info: {
mode: STRING, -- "chat", "embedding", "image_generation", etc.
...metadata
}
}
],
litellm_settings: MAP[key -> value], -- library-level toggles
general_settings: MAP[key -> value], -- server-level settings
router_settings: MAP[key -> value] -- load balancer configuration
}
The loading algorithm follows these steps:
FUNCTION load_proxy_config(config_path):
raw_config = PARSE_YAML(config_path)
-- Phase 1: Resolve environment variables
FOR EACH (name, value) IN raw_config.environment_variables:
SET_ENV(name, RESOLVE(value))
-- Phase 2: Apply litellm settings
FOR EACH (key, value) IN raw_config.litellm_settings:
IF key == "cache" THEN initialize_cache(value)
ELSE IF key == "callbacks" THEN register_callbacks(value)
ELSE set_litellm_attribute(key, value)
-- Phase 3: Build the model router
router = NEW Router(
model_list = raw_config.model_list,
routing_strategy = raw_config.router_settings.routing_strategy,
num_retries = raw_config.router_settings.num_retries,
...
)
-- Phase 4: Apply general settings
FOR EACH (key, value) IN raw_config.general_settings:
apply_server_setting(key, value)
RETURN (router, model_list, general_settings)
Key design principles:
- Separation of concerns -- Model definitions, runtime settings, and infrastructure settings are kept in distinct configuration sections.
- Environment variable indirection -- Credentials are never hard-coded; they are resolved from environment variables at load time using the pattern
os.environ/VARIABLE_NAME. - Idempotent loading -- The configuration can be reloaded at runtime without restarting the server, enabling live updates to model deployments.
- Layered configuration -- Database-stored model definitions can overlay or override file-based definitions, supporting hybrid static/dynamic configuration models.