Implementation:BerriAI Litellm Proxy Request Processing
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| BerriAI/litellm repository | API Gateway, Request Pipeline, LLM Routing | 2026-02-15 |
Overview
Concrete tool for processing OpenAI-compatible proxy requests through authentication, enrichment, and routing provided by the LiteLLM ProxyBaseLLMRequestProcessing class and route_request function.
Description
The ProxyBaseLLMRequestProcessing class in common_request_processing.py is the central request pipeline handler for all LLM API endpoints in the LiteLLM proxy. It provides:
common_processing_pre_call_logic-- The pre-call pipeline that enriches request data with proxy metadata, applies litellm data transformations (viaadd_litellm_data_to_request), creates the logging object, and returns the processed data and logging object ready for the LLM call.get_custom_headers-- A static method that constructs custom response headers containing operational metadata: call ID, model ID, API base, response cost (including discount and margin breakdowns), key budget/spend/limits, latency metrics, and remaining rate limit quotas.- Response handling -- Methods for processing both streaming and non-streaming responses, including error detection in the first streaming chunk and conversion to JSON error responses.
The route_request function (in route_llm_request.py) handles the routing decision after pre-call processing. It determines whether to use the router (for load-balanced multi-deployment routing), call a specific provider directly, or fall back to a default model.
Usage
These components are used internally by all LLM API endpoint handlers in the proxy (e.g., /chat/completions, /embeddings, /responses). They are not typically called directly by external code but are central to understanding how the proxy processes every LLM request.
Code Reference
| Attribute | Value |
|---|---|
| Source Location (class) | litellm/proxy/common_request_processing.py, line 353
|
| Source Location (routing) | litellm/proxy/route_llm_request.py, line 132
|
| Pre-call Signature | async def common_processing_pre_call_logic(self, request: Request, general_settings: dict, user_api_key_dict: UserAPIKeyAuth, proxy_logging_obj: ProxyLogging, proxy_config: ProxyConfig, route_type: Literal[...], version: Optional[str] = None, user_model: Optional[str] = None, user_temperature: Optional[float] = None, user_request_timeout: Optional[float] = None, user_max_tokens: Optional[int] = None, user_api_base: Optional[str] = None, model: Optional[str] = None, llm_router: Optional[Router] = None) -> Tuple[dict, LiteLLMLoggingObj]
|
| Routing Signature | async def route_request(data: dict, llm_router: Optional[LitellmRouter], user_model: Optional[str], route_type: Literal[...]) -> Any
|
| Import (class) | from litellm.proxy.common_request_processing import ProxyBaseLLMRequestProcessing
|
| Import (routing) | from litellm.proxy.route_llm_request import route_request
|
I/O Contract
Inputs (common_processing_pre_call_logic)
| Parameter | Type | Description |
|---|---|---|
request |
fastapi.Request |
The incoming HTTP request object. |
general_settings |
dict |
Server-level settings from the proxy configuration. |
user_api_key_dict |
UserAPIKeyAuth |
Authenticated caller context containing key permissions, budget, team, and role information. |
proxy_logging_obj |
ProxyLogging |
The proxy-level logging manager for registering callbacks and hooks. |
proxy_config |
ProxyConfig |
The proxy configuration instance for accessing config state. |
route_type |
Literal[...] |
The type of LLM operation being performed (e.g., "acompletion", "aembedding", "aresponses").
|
model |
Optional[str] |
The model name extracted from the request body. |
llm_router |
Optional[Router] |
The router instance for load-balanced model routing. |
Outputs (common_processing_pre_call_logic)
| Return Element | Type | Description |
|---|---|---|
data |
dict |
The enriched request data with proxy metadata, logging object, and resolved parameters. |
logging_obj |
LiteLLMLoggingObj |
The logging object configured for this specific request, used for tracking latency, cost, and callbacks. |
Inputs (route_request)
| Parameter | Type | Description |
|---|---|---|
data |
dict |
The processed request data from pre-call logic. |
llm_router |
Optional[LitellmRouter] |
The router instance, or None for direct provider calls.
|
user_model |
Optional[str] |
Default model override from server configuration. |
route_type |
Literal[...] |
The type of LLM operation to route. |
Outputs (route_request)
| Return Element | Type | Description |
|---|---|---|
response |
Union[ModelResponse, AsyncGenerator] |
The LLM response, either as a complete response object or an async generator for streaming. |
Usage Examples
Typical usage within a proxy endpoint handler (internal):
from litellm.proxy.common_request_processing import ProxyBaseLLMRequestProcessing
from litellm.proxy.route_llm_request import route_request
# Inside a FastAPI endpoint handler like /chat/completions
processing = ProxyBaseLLMRequestProcessing(data=request_data)
# Pre-call processing: authentication enrichment, guardrails, logging setup
data, logging_obj = await processing.common_processing_pre_call_logic(
request=request,
general_settings=general_settings,
user_api_key_dict=user_api_key_dict,
proxy_logging_obj=proxy_logging_obj,
proxy_config=proxy_config,
route_type="acompletion",
model=data.get("model"),
llm_router=llm_router,
)
# Route the request to the appropriate provider
response = await route_request(
data=data,
llm_router=llm_router,
user_model=user_model,
route_type="acompletion",
)
# Build custom response headers
custom_headers = ProxyBaseLLMRequestProcessing.get_custom_headers(
user_api_key_dict=user_api_key_dict,
call_id=data.get("litellm_call_id"),
model_id=data.get("model_id"),
response_cost="0.0032",
version="1.0.0",
request_data=data,
)
Client-side usage showing protocol transparency:
from openai import OpenAI
# The client does not need to know about the proxy pipeline
client = OpenAI(
api_key="sk-my-proxy-key",
base_url="http://localhost:4000"
)
# Standard OpenAI API call, processed by ProxyBaseLLMRequestProcessing
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain quantum computing."}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")