Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:BerriAI Litellm Proxy Request Processing

From Leeroopedia
Knowledge Sources Domains Last Updated
BerriAI/litellm repository API Gateway, Request Pipeline, LLM Routing 2026-02-15

Overview

Concrete tool for processing OpenAI-compatible proxy requests through authentication, enrichment, and routing provided by the LiteLLM ProxyBaseLLMRequestProcessing class and route_request function.

Description

The ProxyBaseLLMRequestProcessing class in common_request_processing.py is the central request pipeline handler for all LLM API endpoints in the LiteLLM proxy. It provides:

  • common_processing_pre_call_logic -- The pre-call pipeline that enriches request data with proxy metadata, applies litellm data transformations (via add_litellm_data_to_request), creates the logging object, and returns the processed data and logging object ready for the LLM call.
  • get_custom_headers -- A static method that constructs custom response headers containing operational metadata: call ID, model ID, API base, response cost (including discount and margin breakdowns), key budget/spend/limits, latency metrics, and remaining rate limit quotas.
  • Response handling -- Methods for processing both streaming and non-streaming responses, including error detection in the first streaming chunk and conversion to JSON error responses.

The route_request function (in route_llm_request.py) handles the routing decision after pre-call processing. It determines whether to use the router (for load-balanced multi-deployment routing), call a specific provider directly, or fall back to a default model.

Usage

These components are used internally by all LLM API endpoint handlers in the proxy (e.g., /chat/completions, /embeddings, /responses). They are not typically called directly by external code but are central to understanding how the proxy processes every LLM request.

Code Reference

Attribute Value
Source Location (class) litellm/proxy/common_request_processing.py, line 353
Source Location (routing) litellm/proxy/route_llm_request.py, line 132
Pre-call Signature async def common_processing_pre_call_logic(self, request: Request, general_settings: dict, user_api_key_dict: UserAPIKeyAuth, proxy_logging_obj: ProxyLogging, proxy_config: ProxyConfig, route_type: Literal[...], version: Optional[str] = None, user_model: Optional[str] = None, user_temperature: Optional[float] = None, user_request_timeout: Optional[float] = None, user_max_tokens: Optional[int] = None, user_api_base: Optional[str] = None, model: Optional[str] = None, llm_router: Optional[Router] = None) -> Tuple[dict, LiteLLMLoggingObj]
Routing Signature async def route_request(data: dict, llm_router: Optional[LitellmRouter], user_model: Optional[str], route_type: Literal[...]) -> Any
Import (class) from litellm.proxy.common_request_processing import ProxyBaseLLMRequestProcessing
Import (routing) from litellm.proxy.route_llm_request import route_request

I/O Contract

Inputs (common_processing_pre_call_logic)

Parameter Type Description
request fastapi.Request The incoming HTTP request object.
general_settings dict Server-level settings from the proxy configuration.
user_api_key_dict UserAPIKeyAuth Authenticated caller context containing key permissions, budget, team, and role information.
proxy_logging_obj ProxyLogging The proxy-level logging manager for registering callbacks and hooks.
proxy_config ProxyConfig The proxy configuration instance for accessing config state.
route_type Literal[...] The type of LLM operation being performed (e.g., "acompletion", "aembedding", "aresponses").
model Optional[str] The model name extracted from the request body.
llm_router Optional[Router] The router instance for load-balanced model routing.

Outputs (common_processing_pre_call_logic)

Return Element Type Description
data dict The enriched request data with proxy metadata, logging object, and resolved parameters.
logging_obj LiteLLMLoggingObj The logging object configured for this specific request, used for tracking latency, cost, and callbacks.

Inputs (route_request)

Parameter Type Description
data dict The processed request data from pre-call logic.
llm_router Optional[LitellmRouter] The router instance, or None for direct provider calls.
user_model Optional[str] Default model override from server configuration.
route_type Literal[...] The type of LLM operation to route.

Outputs (route_request)

Return Element Type Description
response Union[ModelResponse, AsyncGenerator] The LLM response, either as a complete response object or an async generator for streaming.

Usage Examples

Typical usage within a proxy endpoint handler (internal):

from litellm.proxy.common_request_processing import ProxyBaseLLMRequestProcessing
from litellm.proxy.route_llm_request import route_request

# Inside a FastAPI endpoint handler like /chat/completions
processing = ProxyBaseLLMRequestProcessing(data=request_data)

# Pre-call processing: authentication enrichment, guardrails, logging setup
data, logging_obj = await processing.common_processing_pre_call_logic(
    request=request,
    general_settings=general_settings,
    user_api_key_dict=user_api_key_dict,
    proxy_logging_obj=proxy_logging_obj,
    proxy_config=proxy_config,
    route_type="acompletion",
    model=data.get("model"),
    llm_router=llm_router,
)

# Route the request to the appropriate provider
response = await route_request(
    data=data,
    llm_router=llm_router,
    user_model=user_model,
    route_type="acompletion",
)

# Build custom response headers
custom_headers = ProxyBaseLLMRequestProcessing.get_custom_headers(
    user_api_key_dict=user_api_key_dict,
    call_id=data.get("litellm_call_id"),
    model_id=data.get("model_id"),
    response_cost="0.0032",
    version="1.0.0",
    request_data=data,
)

Client-side usage showing protocol transparency:

from openai import OpenAI

# The client does not need to know about the proxy pipeline
client = OpenAI(
    api_key="sk-my-proxy-key",
    base_url="http://localhost:4000"
)

# Standard OpenAI API call, processed by ProxyBaseLLMRequestProcessing
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment