Principle:BerriAI Litellm Server Startup

Knowledge Sources	Domains	Last Updated
BerriAI/litellm repository	HTTP Server, LLM Gateway, Process Management	2026-02-15

Overview

Starting an OpenAI-compatible HTTP gateway server that routes LLM requests to multiple providers through a unified API surface.

Description

Server startup is the process of initializing and launching an HTTP server that exposes an OpenAI-compatible API while internally routing requests across multiple LLM providers. This process transforms a declarative configuration (YAML file) and runtime parameters (CLI arguments) into a fully operational gateway server capable of handling concurrent LLM requests with authentication, load balancing, and observability.

The startup sequence orchestrates several interdependent subsystems:

CLI argument parsing -- Collecting host, port, number of workers, configuration file path, SSL settings, and operational flags from the command line.
Configuration loading -- Reading and applying the YAML config file to initialize model deployments, settings, and integrations.
Database connectivity -- Establishing a connection to the database (if configured) for API key management and spend tracking.
ASGI server initialization -- Launching the HTTP server via uvicorn, gunicorn, or hypercorn with the configured FastAPI application.
Background task scheduling -- Starting health check loops, spend tracking batch writers, and budget reset schedulers.
JWT and authentication setup -- Initializing authentication mechanisms including JWT handlers and master key validation.

The startup process is designed to be flexible in deployment topology, supporting single-worker uvicorn for development, multi-worker gunicorn for production, and hypercorn for HTTP/2 support.

Usage

Use server startup when:

Deploying the LLM proxy as a standalone service accessible over HTTP/HTTPS.
Running the proxy with multiple workers for horizontal scaling on a single machine.
Starting the proxy from the command line, a Docker entrypoint, or a process manager.
Configuring SSL/TLS termination directly on the proxy server.
Running health checks or test requests against the proxy configuration before full startup.

Theoretical Basis

Server startup follows the layered initialization pattern where each layer must complete successfully before the next layer can begin. Failures in earlier layers (e.g., missing configuration) prevent later layers from executing.

FUNCTION start_server(host, port, num_workers, config_path, ...options):
    -- Layer 1: CLI processing and validation
    IF version_flag THEN print_version(); RETURN
    IF health_flag THEN run_health_check(host, port); RETURN

    -- Layer 2: Configuration
    save_worker_config(
        model = options.model,
        config = config_path,
        ...remaining_options
    )

    -- Layer 3: Database setup (if configured)
    IF database_url IS SET THEN
        IF use_prisma_db_push THEN
            RUN_COMMAND("prisma db push")
        ELSE
            RUN_COMMAND("prisma migrate deploy")
        RUN_COMMAND("prisma generate")

    -- Layer 4: ASGI server launch
    IF run_gunicorn THEN
        launch_gunicorn(host, port, num_workers, app)
    ELSE IF run_hypercorn THEN
        launch_hypercorn(host, port, app)
    ELSE
        launch_uvicorn(host, port, app,
            ssl_keyfile = options.ssl_keyfile,
            ssl_certfile = options.ssl_certfile,
            keepalive_timeout = options.keepalive_timeout,
            limit_max_requests = options.max_requests_before_restart
        )

Key design principles:

Fail fast -- Invalid configuration, missing dependencies, or database connectivity failures are detected early and reported with clear error messages before the server begins accepting traffic.
Worker isolation -- Each worker process independently loads configuration and initializes its own connection pools, ensuring fault isolation.
Graceful lifecycle -- The server uses ASGI lifespan events to initialize resources (database connections, HTTP sessions, background tasks) on startup and clean them up on shutdown.
Configuration precedence -- CLI arguments override environment variables, which override config file values, providing a predictable hierarchy of configuration sources.

Deployment topology options:

Single-worker (development):
    uvicorn -> FastAPI app -> Router -> LLM Providers

Multi-worker (production):
    gunicorn master
        -> worker 1 -> FastAPI app -> Router -> LLM Providers
        -> worker 2 -> FastAPI app -> Router -> LLM Providers
        -> worker N -> FastAPI app -> Router -> LLM Providers

HTTP/2 support:
    hypercorn -> FastAPI app -> Router -> LLM Providers

Related Pages

Implementation:BerriAI_Litellm_Run_Server

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment