Workflow:BerriAI Litellm Proxy Server Deployment

Knowledge Sources	LiteLLM LiteLLM Docs
Domains	LLM_Ops, Infrastructure, API_Gateway
Last Updated	2026-02-15 16:00 GMT

Overview

End-to-end process for deploying the LiteLLM Proxy as an OpenAI-compatible AI Gateway with authentication, rate limiting, and multi-provider model management.

Description

This workflow covers the deployment of LiteLLM's Proxy Server, a FastAPI-based AI Gateway that exposes OpenAI-compatible API endpoints for any supported LLM provider. The proxy handles virtual API key management, per-key and per-team budgets, rate limiting, request routing across model deployments, observability callbacks, and guardrails. It can be deployed via CLI, Docker, or as a Python application, backed by PostgreSQL or SQLite for persistent state.

Key outputs:

A running OpenAI-compatible API server on a configurable port
Virtual API key management with spend tracking
Team and user access control with budget limits
Centralized logging and observability for all LLM calls

Usage

Execute this workflow when you need a centralized API gateway for LLM access across your organization. This is the standard deployment pattern for teams that need API key management, usage tracking, budget controls, and a unified endpoint for multiple LLM providers.

Execution Steps

Step 1: Configuration File Creation

Create a YAML configuration file defining the model list, general settings, and optional features. The config specifies which models to expose, their underlying provider configurations, authentication requirements, and any callbacks or guardrails to enable. Each model entry maps a logical model name to one or more provider deployments.

Key considerations:

The model_list section defines available models with their provider parameters
general_settings configures the master key, database URL, and global behavior
API keys can reference environment variables using the os.environ/VARIABLE_NAME syntax
Callback integrations (Langfuse, Prometheus, etc.) are configured in the litellm_settings section

Step 2: Database Setup

Configure the database backend for persistent storage of API keys, teams, users, spend logs, and configuration. PostgreSQL is recommended for production; SQLite is available for development. The proxy uses Prisma ORM for database operations and automatically runs migrations on startup.

Key considerations:

Set DATABASE_URL environment variable for PostgreSQL connection
SQLite is used by default if no database URL is provided
Prisma migrations are applied automatically on proxy startup
Database stores virtual keys, teams, users, spend logs, and model configurations

Step 3: Server Startup

Launch the proxy server via the CLI command litellm --config config.yaml or via Docker. The CLI parses the config, initializes the FastAPI application, sets up the Router with configured models, connects to the database, and starts serving on the configured host and port (default: 0.0.0.0:4000).

Key considerations:

The --port flag overrides the default port
Docker images are available for production deployment
The --detailed_debug flag enables verbose logging for troubleshooting
Health check endpoint is available at /health/liveliness

Step 4: API Key Management

Create virtual API keys for users and teams via the management API (/key/generate). Each key can have associated budgets, rate limits, model access restrictions, and team membership. Keys are authenticated on every request and their usage is tracked for billing and compliance.

Key considerations:

The master key has full admin access to all management endpoints
Virtual keys can be scoped to specific models, teams, and budgets
Key spend is tracked in real-time and enforced against budget limits
Keys can be created, updated, listed, and deleted via management endpoints

Step 5: Client Integration

Connect client applications to the proxy using any OpenAI-compatible SDK. Clients point their base URL to the proxy server and use a virtual API key for authentication. The proxy transparently routes requests to the appropriate LLM provider based on the model name.

Key considerations:

Any OpenAI SDK client works by changing the base URL to the proxy address
The proxy supports all standard OpenAI endpoints: /v1/chat/completions, /v1/embeddings, /v1/files, etc.
Streaming responses are fully supported
Pass-through endpoints allow direct access to provider-specific APIs

Step 6: Monitoring and Operations

Monitor proxy health, usage, and costs through built-in endpoints and configured callbacks. The proxy exposes Prometheus metrics, supports alerting integrations, and provides a web-based admin UI for dashboard access. Spend reports, model performance, and user activity can be queried via management endpoints.

Key considerations:

/metrics endpoint exposes Prometheus-compatible metrics
The admin UI is served at the root path for visual management
Spend endpoints provide per-key, per-team, and per-model cost breakdowns
Health check endpoints monitor model availability and latency

Execution Diagram

GitHub URL

Workflow Repository