Principle:BerriAI Litellm Client Integration
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| BerriAI/litellm repository | API Gateway, Request Processing, Protocol Compatibility | 2026-02-15 |
Overview
Processing OpenAI-compatible client requests through authentication, enrichment, routing, and response formatting in an LLM proxy gateway.
Description
Client integration is the process of accepting HTTP requests from clients that speak the OpenAI API protocol, processing those requests through a multi-stage pipeline, routing them to the appropriate LLM provider, and returning responses in the expected format. The proxy gateway acts as a transparent intermediary that allows any OpenAI SDK client to seamlessly interact with 100+ LLM providers without changing client code.
The request processing pipeline comprises several stages:
- Authentication -- Validating the caller's API key via the
Authorizationheader and resolving the key's permissions, budget, rate limits, and team associations. - Request enrichment -- Augmenting the request with proxy-specific metadata such as call IDs, logging objects, version information, and user-model overrides.
- Pre-call guardrails -- Running configured guardrails (content safety, PII detection, prompt injection detection) before forwarding the request to the LLM provider.
- Routing -- Selecting the appropriate model deployment from the router based on the requested model name, load balancing strategy, and fallback configuration.
- Provider call -- Forwarding the enriched request to the selected LLM provider via the litellm library, handling both streaming and non-streaming responses.
- Response processing -- Formatting the provider's response into the OpenAI-compatible format, injecting custom headers (cost, latency, model ID), and overriding internal model identifiers with the client-requested model name.
- Post-call hooks -- Executing logging, spend tracking, and callback integrations after the response is generated.
This pipeline enables clients to use standard OpenAI SDKs in any language (Python, JavaScript, Go, etc.) by simply pointing them at the proxy's base URL.
Usage
Use client integration when:
- Deploying an LLM proxy that must accept requests from existing OpenAI SDK clients without modification.
- Building applications that need to switch between LLM providers transparently.
- Implementing a gateway that adds authentication, rate limiting, and budget enforcement on top of raw LLM provider APIs.
- Supporting both streaming and non-streaming response modes across all providers.
- Adding observability (logging, tracing, metrics) to LLM API calls without modifying client code.
Theoretical Basis
Client integration in an LLM proxy follows the pipeline processing pattern, where each request passes through an ordered sequence of processing stages. Each stage can modify the request, short-circuit the pipeline (e.g., for rate limit violations), or enrich the context for subsequent stages.
FUNCTION process_request(http_request, api_key):
-- Stage 1: Authentication
key_auth = AUTHENTICATE(api_key)
IF NOT key_auth.valid THEN RETURN 401 Unauthorized
-- Stage 2: Parse and validate request body
data = PARSE_JSON(http_request.body)
VALIDATE data matches OpenAI schema for route_type
-- Stage 3: Pre-call processing
data = ENRICH_REQUEST(data,
call_id = GENERATE_UUID(),
user_api_key_dict = key_auth,
proxy_config = current_config,
)
logging_obj = CREATE_LOGGING_OBJECT(data)
-- Stage 4: Guardrails
FOR EACH guardrail IN active_guardrails(key_auth):
result = AWAIT guardrail.check(data)
IF result.blocked THEN RETURN 400 Blocked
-- Stage 5: Routing
response = ROUTE_REQUEST(
data = data,
router = llm_router,
route_type = determine_route_type(http_request.path)
)
-- Stage 6: Response formatting
headers = BUILD_CUSTOM_HEADERS(
call_id, model_id, response_cost, latency, key_auth
)
OVERRIDE response.model WITH data.requested_model
-- Stage 7: Post-call hooks
AWAIT log_success(data, response, key_auth)
AWAIT update_spend(key_auth, response_cost)
RETURN formatted_response WITH headers
The routing decision tree:
FUNCTION route_request(data, router, route_type):
IF router IS NOT None AND data.model IN router.model_names THEN
-- Use the router for load-balanced routing
RETURN AWAIT router.call(route_type, **data)
ELSE IF router IS NOT None AND data.model CONTAINS "/" THEN
-- Direct provider call (e.g., "openai/gpt-4")
RETURN AWAIT router.call(route_type, **data)
ELSE IF user_model IS SET THEN
-- Fallback to user-specified default model
data.model = user_model
RETURN AWAIT litellm.call(route_type, **data)
ELSE
RETURN AWAIT litellm.call(route_type, **data)
Key design principles:
- Protocol transparency -- Clients interact with the proxy using the exact same API contract as they would with OpenAI directly. No custom SDKs or protocol extensions are required.
- Enrichment over mutation -- The pipeline adds metadata and context to requests rather than fundamentally altering the client's intent.
- Fail-open observability -- Custom response headers expose internal metadata (call ID, cost, latency, model ID) without breaking the response format.
- Streaming-first design -- The pipeline handles both streaming and non-streaming responses uniformly, detecting errors in the first chunk and converting to appropriate JSON error responses when needed.