Principle:BerriAI Litellm Response Normalization
| Knowledge Sources | BerriAI/litellm repository |
|---|---|
| Domains | LLM Integration, Response Processing, Data Transformation |
| Last Updated | 2026-02-15 |
Overview
Response normalization is the practice of transforming diverse provider-specific response formats into a single, canonical structure so that downstream consumers operate on a uniform data contract.
Description
Every LLM provider returns completion results in its own format: different field names, nesting structures, token-counting conventions, and metadata layouts. Response normalization solves this problem by defining a canonical response schema -- modeled after the OpenAI Chat Completions response -- and mapping every provider's output into that schema before returning it to the caller.
This principle ensures that application code processing LLM responses never needs to contain provider-specific branching logic. Whether the response originated from OpenAI, Anthropic, Cohere, Bedrock, or any other provider, the consumer sees the same fields: id, choices (with message, finish_reason, index), usage (with prompt_tokens, completion_tokens, total_tokens), model, and created.
For streaming responses, the same normalization applies chunk-by-chunk, yielding a uniform stream of delta objects regardless of the provider's native streaming protocol.
Usage
Apply response normalization whenever:
- Downstream code processes completion results from multiple providers.
- Usage tracking or billing logic depends on a consistent
usagestructure. - Streaming responses from different providers must be consumed with the same iterator protocol.
- Response metadata (model name, timestamps, system fingerprint) must be available in a uniform format.
Theoretical Basis
Response normalization implements the Adapter Pattern and the Canonical Data Model pattern from enterprise integration architecture. The core principles are:
1. Canonical Response Schema
A fixed output schema defines exactly which fields exist, their types, and their semantics. Every provider response is mapped into this schema.
# Pseudocode: canonical response structure
CanonicalResponse:
id: string # unique completion identifier
object: "chat.completion" # fixed type discriminator
created: integer # Unix timestamp
model: string # model identifier
choices: List[Choice]
usage: Usage
Choice:
index: integer
message: Message
finish_reason: string # "stop", "length", "tool_calls", etc.
Message:
role: "assistant"
content: string or null
tool_calls: List[ToolCall] or null
Usage:
prompt_tokens: integer
completion_tokens: integer
total_tokens: integer
2. Stream Chunk Normalization
For streaming, each chunk follows a parallel schema with delta objects instead of complete message objects. The stream wrapper handles provider-specific streaming protocols (SSE, chunked HTTP, WebSocket) and emits uniform chunks.
# Pseudocode: stream normalization
function normalize_stream(provider_stream):
for raw_chunk in provider_stream:
normalized = map_to_canonical_chunk(raw_chunk)
yield normalized
3. Defensive Construction
The response object handles missing or unexpected fields gracefully by providing defaults (e.g., auto-generating an id if the provider does not supply one, defaulting created to the current timestamp, initializing usage to zeros).
4. Bidirectional Serialization
The normalized response supports both object-attribute access and dictionary-style access, and can serialize to JSON via model_dump() or json(). This dual-access pattern makes the response usable in both typed and dynamic contexts.