Principle:BerriAI Litellm Response Normalization

Knowledge Sources	BerriAI/litellm repository
Domains	LLM Integration, Response Processing, Data Transformation
Last Updated	2026-02-15

Overview

Response normalization is the practice of transforming diverse provider-specific response formats into a single, canonical structure so that downstream consumers operate on a uniform data contract.

Description

Every LLM provider returns completion results in its own format: different field names, nesting structures, token-counting conventions, and metadata layouts. Response normalization solves this problem by defining a canonical response schema -- modeled after the OpenAI Chat Completions response -- and mapping every provider's output into that schema before returning it to the caller.

This principle ensures that application code processing LLM responses never needs to contain provider-specific branching logic. Whether the response originated from OpenAI, Anthropic, Cohere, Bedrock, or any other provider, the consumer sees the same fields: id, choices (with message, finish_reason, index), usage (with prompt_tokens, completion_tokens, total_tokens), model, and created.

For streaming responses, the same normalization applies chunk-by-chunk, yielding a uniform stream of delta objects regardless of the provider's native streaming protocol.

Usage

Apply response normalization whenever:

Downstream code processes completion results from multiple providers.
Usage tracking or billing logic depends on a consistent usage structure.
Streaming responses from different providers must be consumed with the same iterator protocol.
Response metadata (model name, timestamps, system fingerprint) must be available in a uniform format.

Theoretical Basis

Response normalization implements the Adapter Pattern and the Canonical Data Model pattern from enterprise integration architecture. The core principles are:

1. Canonical Response Schema

A fixed output schema defines exactly which fields exist, their types, and their semantics. Every provider response is mapped into this schema.

# Pseudocode: canonical response structure
CanonicalResponse:
    id: string                # unique completion identifier
    object: "chat.completion"  # fixed type discriminator
    created: integer           # Unix timestamp
    model: string              # model identifier
    choices: List[Choice]
    usage: Usage

Choice:
    index: integer
    message: Message
    finish_reason: string      # "stop", "length", "tool_calls", etc.

Message:
    role: "assistant"
    content: string or null
    tool_calls: List[ToolCall] or null

Usage:
    prompt_tokens: integer
    completion_tokens: integer
    total_tokens: integer

2. Stream Chunk Normalization

For streaming, each chunk follows a parallel schema with delta objects instead of complete message objects. The stream wrapper handles provider-specific streaming protocols (SSE, chunked HTTP, WebSocket) and emits uniform chunks.

# Pseudocode: stream normalization
function normalize_stream(provider_stream):
    for raw_chunk in provider_stream:
        normalized = map_to_canonical_chunk(raw_chunk)
        yield normalized

3. Defensive Construction

The response object handles missing or unexpected fields gracefully by providing defaults (e.g., auto-generating an id if the provider does not supply one, defaulting created to the current timestamp, initializing usage to zeros).

4. Bidirectional Serialization

The normalized response supports both object-attribute access and dictionary-style access, and can serialize to JSON via model_dump() or json(). This dual-access pattern makes the response usable in both typed and dynamic contexts.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment