Principle:BerriAI Litellm Server Startup
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| BerriAI/litellm repository | HTTP Server, LLM Gateway, Process Management | 2026-02-15 |
Overview
Starting an OpenAI-compatible HTTP gateway server that routes LLM requests to multiple providers through a unified API surface.
Description
Server startup is the process of initializing and launching an HTTP server that exposes an OpenAI-compatible API while internally routing requests across multiple LLM providers. This process transforms a declarative configuration (YAML file) and runtime parameters (CLI arguments) into a fully operational gateway server capable of handling concurrent LLM requests with authentication, load balancing, and observability.
The startup sequence orchestrates several interdependent subsystems:
- CLI argument parsing -- Collecting host, port, number of workers, configuration file path, SSL settings, and operational flags from the command line.
- Configuration loading -- Reading and applying the YAML config file to initialize model deployments, settings, and integrations.
- Database connectivity -- Establishing a connection to the database (if configured) for API key management and spend tracking.
- ASGI server initialization -- Launching the HTTP server via uvicorn, gunicorn, or hypercorn with the configured FastAPI application.
- Background task scheduling -- Starting health check loops, spend tracking batch writers, and budget reset schedulers.
- JWT and authentication setup -- Initializing authentication mechanisms including JWT handlers and master key validation.
The startup process is designed to be flexible in deployment topology, supporting single-worker uvicorn for development, multi-worker gunicorn for production, and hypercorn for HTTP/2 support.
Usage
Use server startup when:
- Deploying the LLM proxy as a standalone service accessible over HTTP/HTTPS.
- Running the proxy with multiple workers for horizontal scaling on a single machine.
- Starting the proxy from the command line, a Docker entrypoint, or a process manager.
- Configuring SSL/TLS termination directly on the proxy server.
- Running health checks or test requests against the proxy configuration before full startup.
Theoretical Basis
Server startup follows the layered initialization pattern where each layer must complete successfully before the next layer can begin. Failures in earlier layers (e.g., missing configuration) prevent later layers from executing.
FUNCTION start_server(host, port, num_workers, config_path, ...options):
-- Layer 1: CLI processing and validation
IF version_flag THEN print_version(); RETURN
IF health_flag THEN run_health_check(host, port); RETURN
-- Layer 2: Configuration
save_worker_config(
model = options.model,
config = config_path,
...remaining_options
)
-- Layer 3: Database setup (if configured)
IF database_url IS SET THEN
IF use_prisma_db_push THEN
RUN_COMMAND("prisma db push")
ELSE
RUN_COMMAND("prisma migrate deploy")
RUN_COMMAND("prisma generate")
-- Layer 4: ASGI server launch
IF run_gunicorn THEN
launch_gunicorn(host, port, num_workers, app)
ELSE IF run_hypercorn THEN
launch_hypercorn(host, port, app)
ELSE
launch_uvicorn(host, port, app,
ssl_keyfile = options.ssl_keyfile,
ssl_certfile = options.ssl_certfile,
keepalive_timeout = options.keepalive_timeout,
limit_max_requests = options.max_requests_before_restart
)
Key design principles:
- Fail fast -- Invalid configuration, missing dependencies, or database connectivity failures are detected early and reported with clear error messages before the server begins accepting traffic.
- Worker isolation -- Each worker process independently loads configuration and initializes its own connection pools, ensuring fault isolation.
- Graceful lifecycle -- The server uses ASGI lifespan events to initialize resources (database connections, HTTP sessions, background tasks) on startup and clean them up on shutdown.
- Configuration precedence -- CLI arguments override environment variables, which override config file values, providing a predictable hierarchy of configuration sources.
Deployment topology options:
Single-worker (development):
uvicorn -> FastAPI app -> Router -> LLM Providers
Multi-worker (production):
gunicorn master
-> worker 1 -> FastAPI app -> Router -> LLM Providers
-> worker 2 -> FastAPI app -> Router -> LLM Providers
-> worker N -> FastAPI app -> Router -> LLM Providers
HTTP/2 support:
hypercorn -> FastAPI app -> Router -> LLM Providers