Implementation:BerriAI Litellm Proxy Request Processing

Knowledge Sources	Domains	Last Updated
BerriAI/litellm repository	API Gateway, Request Pipeline, LLM Routing	2026-02-15

Overview

Concrete tool for processing OpenAI-compatible proxy requests through authentication, enrichment, and routing provided by the LiteLLM ProxyBaseLLMRequestProcessing class and route_request function.

Description

The ProxyBaseLLMRequestProcessing class in common_request_processing.py is the central request pipeline handler for all LLM API endpoints in the LiteLLM proxy. It provides:

common_processing_pre_call_logic -- The pre-call pipeline that enriches request data with proxy metadata, applies litellm data transformations (via add_litellm_data_to_request), creates the logging object, and returns the processed data and logging object ready for the LLM call.
get_custom_headers -- A static method that constructs custom response headers containing operational metadata: call ID, model ID, API base, response cost (including discount and margin breakdowns), key budget/spend/limits, latency metrics, and remaining rate limit quotas.
Response handling -- Methods for processing both streaming and non-streaming responses, including error detection in the first streaming chunk and conversion to JSON error responses.

The route_request function (in route_llm_request.py) handles the routing decision after pre-call processing. It determines whether to use the router (for load-balanced multi-deployment routing), call a specific provider directly, or fall back to a default model.

Usage

These components are used internally by all LLM API endpoint handlers in the proxy (e.g., /chat/completions, /embeddings, /responses). They are not typically called directly by external code but are central to understanding how the proxy processes every LLM request.

Code Reference

Attribute	Value
Source Location (class)	`litellm/proxy/common_request_processing.py`, line 353
Source Location (routing)	`litellm/proxy/route_llm_request.py`, line 132
Pre-call Signature	async def common_processing_pre_call_logic(self, request: Request, general_settings: dict, user_api_key_dict: UserAPIKeyAuth, proxy_logging_obj: ProxyLogging, proxy_config: ProxyConfig, route_type: Literal[...], version: Optional[str] = None, user_model: Optional[str] = None, user_temperature: Optional[float] = None, user_request_timeout: Optional[float] = None, user_max_tokens: Optional[int] = None, user_api_base: Optional[str] = None, model: Optional[str] = None, llm_router: Optional[Router] = None) -> Tuple[dict, LiteLLMLoggingObj]
Routing Signature	`async def route_request(data: dict, llm_router: Optional[LitellmRouter], user_model: Optional[str], route_type: Literal[...]) -> Any`
Import (class)	`from litellm.proxy.common_request_processing import ProxyBaseLLMRequestProcessing`
Import (routing)	`from litellm.proxy.route_llm_request import route_request`

I/O Contract

Inputs (common_processing_pre_call_logic)

Parameter	Type	Description
`request`	`fastapi.Request`	The incoming HTTP request object.
`general_settings`	`dict`	Server-level settings from the proxy configuration.
`user_api_key_dict`	`UserAPIKeyAuth`	Authenticated caller context containing key permissions, budget, team, and role information.
`proxy_logging_obj`	`ProxyLogging`	The proxy-level logging manager for registering callbacks and hooks.
`proxy_config`	`ProxyConfig`	The proxy configuration instance for accessing config state.
`route_type`	`Literal[...]`	The type of LLM operation being performed (e.g., `"acompletion"`, `"aembedding"`, `"aresponses"`).
`model`	`Optional[str]`	The model name extracted from the request body.
`llm_router`	`Optional[Router]`	The router instance for load-balanced model routing.

Outputs (common_processing_pre_call_logic)

Return Element	Type	Description
`data`	`dict`	The enriched request data with proxy metadata, logging object, and resolved parameters.
`logging_obj`	`LiteLLMLoggingObj`	The logging object configured for this specific request, used for tracking latency, cost, and callbacks.

Inputs (route_request)

Parameter	Type	Description
`data`	`dict`	The processed request data from pre-call logic.
`llm_router`	`Optional[LitellmRouter]`	The router instance, or `None` for direct provider calls.
`user_model`	`Optional[str]`	Default model override from server configuration.
`route_type`	`Literal[...]`	The type of LLM operation to route.

Outputs (route_request)

Return Element	Type	Description
`response`	`Union[ModelResponse, AsyncGenerator]`	The LLM response, either as a complete response object or an async generator for streaming.

Usage Examples

Typical usage within a proxy endpoint handler (internal):

from litellm.proxy.common_request_processing import ProxyBaseLLMRequestProcessing
from litellm.proxy.route_llm_request import route_request

# Inside a FastAPI endpoint handler like /chat/completions
processing = ProxyBaseLLMRequestProcessing(data=request_data)

# Pre-call processing: authentication enrichment, guardrails, logging setup
data, logging_obj = await processing.common_processing_pre_call_logic(
    request=request,
    general_settings=general_settings,
    user_api_key_dict=user_api_key_dict,
    proxy_logging_obj=proxy_logging_obj,
    proxy_config=proxy_config,
    route_type="acompletion",
    model=data.get("model"),
    llm_router=llm_router,
)

# Route the request to the appropriate provider
response = await route_request(
    data=data,
    llm_router=llm_router,
    user_model=user_model,
    route_type="acompletion",
)

# Build custom response headers
custom_headers = ProxyBaseLLMRequestProcessing.get_custom_headers(
    user_api_key_dict=user_api_key_dict,
    call_id=data.get("litellm_call_id"),
    model_id=data.get("model_id"),
    response_cost="0.0032",
    version="1.0.0",
    request_data=data,
)

Client-side usage showing protocol transparency:

from openai import OpenAI

# The client does not need to know about the proxy pipeline
client = OpenAI(
    api_key="sk-my-proxy-key",
    base_url="http://localhost:4000"
)

# Standard OpenAI API call, processed by ProxyBaseLLMRequestProcessing
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment