Implementation:BerriAI Litellm Model Response

Knowledge Sources	BerriAI/litellm - litellm/types/utils.py, BerriAI/litellm - litellm/litellm_core_utils/streaming_handler.py
Domains	LLM Integration, Response Processing, Data Transformation
Last Updated	2026-02-15

Overview

Concrete tool for normalizing diverse LLM provider response formats into a unified OpenAI-compatible structure, provided by the litellm Python package via the ModelResponse class in litellm/types/utils.py and the CustomStreamWrapper class in litellm/litellm_core_utils/streaming_handler.py.

Description

The ModelResponse class is a Pydantic model that represents a normalized, non-streaming chat completion response. It inherits from ModelResponseBase (which extends OpenAIObject) and contains fields for id, choices, created, model, object, system_fingerprint, and usage. Its constructor handles flexible input types: choices can be Choices objects, dictionaries, or Pydantic models, and usage can be a Usage object or a dictionary. Missing fields are populated with sensible defaults (auto-generated IDs, current timestamps, empty usage).

The CustomStreamWrapper class wraps provider-specific streaming responses into a uniform iterable/async-iterable that yields ModelResponseStream chunks. It handles chunk normalization, safety checks (detecting infinite loops of repeated chunks), and usage tracking for streaming responses.

Usage

ModelResponse is returned by litellm.completion() for non-streaming calls. CustomStreamWrapper is returned for streaming calls. Both are consumed directly by application code.

Code Reference

Source Location:

ModelResponse: litellm/types/utils.py, lines 1750-1869
CustomStreamWrapper: litellm/litellm_core_utils/streaming_handler.py, lines 69-2247

Signature (ModelResponse):

class ModelResponse(ModelResponseBase):
    choices: List[Union[Choices, StreamingChoices]]

    def __init__(
        self,
        id=None,
        choices=None,
        created=None,
        model=None,
        object=None,
        system_fingerprint=None,
        usage=None,
        stream=None,
        stream_options=None,
        response_ms=None,
        hidden_params=None,
        _response_headers=None,
        **params,
    ) -> None:

Signature (CustomStreamWrapper):

class CustomStreamWrapper:
    def __init__(
        self,
        completion_stream,
        model,
        logging_obj: Any,
        custom_llm_provider: Optional[str] = None,
        stream_options=None,
        make_call: Optional[Callable] = None,
        _response_headers: Optional[dict] = None,
    ):

Signature (ModelResponseBase):

class ModelResponseBase(OpenAIObject):
    id: str
    created: int
    model: Optional[str] = None
    object: str
    system_fingerprint: Optional[str] = None

Import:

from litellm.types.utils import ModelResponse
from litellm.litellm_core_utils.streaming_handler import CustomStreamWrapper

I/O Contract

Inputs (ModelResponse constructor)

Parameter	Type	Description
`id`	`Optional[str]`	Unique completion identifier. Auto-generated if `None`.
`choices`	`Optional[List[Union[Choices, dict, BaseModel]]]`	The list of completion choices. Accepts `Choices` objects, dictionaries, or Pydantic models. Defaults to `[Choices()]` if `None`.
`created`	`Optional[int]`	Unix timestamp of creation. Defaults to `int(time.time())`.
`model`	`Optional[str]`	The model identifier that produced the response.
`usage`	`Optional[Union[Usage, dict, BaseModel]]`	Token usage information. Accepts `Usage` objects, dictionaries, or Pydantic models. Defaults to `Usage()` for non-streaming responses.
`stream`	`Optional[bool]`	If `True`, sets `object` to `"chat.completion.chunk"` and uses `StreamingChoices`. Defaults to `"chat.completion"` with `Choices`.
`hidden_params`	`Optional[dict]`	Internal metadata not exposed in the public response.
`_response_headers`	`Optional[dict]`	HTTP response headers from the provider.

Inputs (CustomStreamWrapper constructor)

Parameter	Type	Description
`completion_stream`	Iterable/AsyncIterable	The raw provider streaming response to wrap.
`model`	`str`	The model identifier.
`logging_obj`	`Any`	The LiteLLM logging object for request/response tracking.
`custom_llm_provider`	`Optional[str]`	The resolved provider name.
`stream_options`	`Optional[dict]`	Options such as `{"include_usage": True}` for stream-level usage reporting.

Outputs

Output	Type	Description
`ModelResponse` instance	`ModelResponse`	A normalized response with `.choices[i].message.content`, `.usage.prompt_tokens`, `.usage.completion_tokens`, `.model`, `.id`, and `.created`.
`CustomStreamWrapper` chunks	`ModelResponseStream`	Each yielded chunk has `.choices[i].delta.content`, `.choices[i].delta.tool_calls`, and `.choices[i].finish_reason`.

Usage Examples

Accessing a non-streaming response:

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Uniform access regardless of provider
print(response.id)                              # "chatcmpl-abc123..."
print(response.choices[0].message.content)      # "Hi there!"
print(response.choices[0].finish_reason)        # "stop"
print(response.usage.prompt_tokens)             # 10
print(response.usage.completion_tokens)         # 5
print(response.usage.total_tokens)              # 15
print(response.model)                           # "gpt-4"

Consuming a streaming response:

import litellm

stream = litellm.completion(
    model="anthropic/claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="")
    if chunk.choices[0].finish_reason:
        print(f"\nFinish reason: {chunk.choices[0].finish_reason}")

Serializing the response to a dictionary:

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

response_dict = response.model_dump()
# {"id": "chatcmpl-...", "choices": [...], "created": 1700000000, ...}

Related Pages

Principle:BerriAI_Litellm_Response_Normalization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment