Workflow:BerriAI Litellm Proxy Server Deployment
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Infrastructure, API_Gateway |
| Last Updated | 2026-02-15 16:00 GMT |
Overview
End-to-end process for deploying the LiteLLM Proxy as an OpenAI-compatible AI Gateway with authentication, rate limiting, and multi-provider model management.
Description
This workflow covers the deployment of LiteLLM's Proxy Server, a FastAPI-based AI Gateway that exposes OpenAI-compatible API endpoints for any supported LLM provider. The proxy handles virtual API key management, per-key and per-team budgets, rate limiting, request routing across model deployments, observability callbacks, and guardrails. It can be deployed via CLI, Docker, or as a Python application, backed by PostgreSQL or SQLite for persistent state.
Key outputs:
- A running OpenAI-compatible API server on a configurable port
- Virtual API key management with spend tracking
- Team and user access control with budget limits
- Centralized logging and observability for all LLM calls
Usage
Execute this workflow when you need a centralized API gateway for LLM access across your organization. This is the standard deployment pattern for teams that need API key management, usage tracking, budget controls, and a unified endpoint for multiple LLM providers.
Execution Steps
Step 1: Configuration File Creation
Create a YAML configuration file defining the model list, general settings, and optional features. The config specifies which models to expose, their underlying provider configurations, authentication requirements, and any callbacks or guardrails to enable. Each model entry maps a logical model name to one or more provider deployments.
Key considerations:
- The
model_listsection defines available models with their provider parameters general_settingsconfigures the master key, database URL, and global behavior- API keys can reference environment variables using the
os.environ/VARIABLE_NAMEsyntax - Callback integrations (Langfuse, Prometheus, etc.) are configured in the
litellm_settingssection
Step 2: Database Setup
Configure the database backend for persistent storage of API keys, teams, users, spend logs, and configuration. PostgreSQL is recommended for production; SQLite is available for development. The proxy uses Prisma ORM for database operations and automatically runs migrations on startup.
Key considerations:
- Set
DATABASE_URLenvironment variable for PostgreSQL connection - SQLite is used by default if no database URL is provided
- Prisma migrations are applied automatically on proxy startup
- Database stores virtual keys, teams, users, spend logs, and model configurations
Step 3: Server Startup
Launch the proxy server via the CLI command litellm --config config.yaml or via Docker. The CLI parses the config, initializes the FastAPI application, sets up the Router with configured models, connects to the database, and starts serving on the configured host and port (default: 0.0.0.0:4000).
Key considerations:
- The
--portflag overrides the default port - Docker images are available for production deployment
- The
--detailed_debugflag enables verbose logging for troubleshooting - Health check endpoint is available at
/health/liveliness
Step 4: API Key Management
Create virtual API keys for users and teams via the management API (/key/generate). Each key can have associated budgets, rate limits, model access restrictions, and team membership. Keys are authenticated on every request and their usage is tracked for billing and compliance.
Key considerations:
- The master key has full admin access to all management endpoints
- Virtual keys can be scoped to specific models, teams, and budgets
- Key spend is tracked in real-time and enforced against budget limits
- Keys can be created, updated, listed, and deleted via management endpoints
Step 5: Client Integration
Connect client applications to the proxy using any OpenAI-compatible SDK. Clients point their base URL to the proxy server and use a virtual API key for authentication. The proxy transparently routes requests to the appropriate LLM provider based on the model name.
Key considerations:
- Any OpenAI SDK client works by changing the base URL to the proxy address
- The proxy supports all standard OpenAI endpoints:
/v1/chat/completions,/v1/embeddings,/v1/files, etc. - Streaming responses are fully supported
- Pass-through endpoints allow direct access to provider-specific APIs
Step 6: Monitoring and Operations
Monitor proxy health, usage, and costs through built-in endpoints and configured callbacks. The proxy exposes Prometheus metrics, supports alerting integrations, and provides a web-based admin UI for dashboard access. Spend reports, model performance, and user activity can be queried via management endpoints.
Key considerations:
/metricsendpoint exposes Prometheus-compatible metrics- The admin UI is served at the root path for visual management
- Spend endpoints provide per-key, per-team, and per-model cost breakdowns
- Health check endpoints monitor model availability and latency