Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:BerriAI Litellm Model Response

From Leeroopedia
Knowledge Sources BerriAI/litellm - litellm/types/utils.py, BerriAI/litellm - litellm/litellm_core_utils/streaming_handler.py
Domains LLM Integration, Response Processing, Data Transformation
Last Updated 2026-02-15

Overview

Concrete tool for normalizing diverse LLM provider response formats into a unified OpenAI-compatible structure, provided by the litellm Python package via the ModelResponse class in litellm/types/utils.py and the CustomStreamWrapper class in litellm/litellm_core_utils/streaming_handler.py.

Description

The ModelResponse class is a Pydantic model that represents a normalized, non-streaming chat completion response. It inherits from ModelResponseBase (which extends OpenAIObject) and contains fields for id, choices, created, model, object, system_fingerprint, and usage. Its constructor handles flexible input types: choices can be Choices objects, dictionaries, or Pydantic models, and usage can be a Usage object or a dictionary. Missing fields are populated with sensible defaults (auto-generated IDs, current timestamps, empty usage).

The CustomStreamWrapper class wraps provider-specific streaming responses into a uniform iterable/async-iterable that yields ModelResponseStream chunks. It handles chunk normalization, safety checks (detecting infinite loops of repeated chunks), and usage tracking for streaming responses.

Usage

ModelResponse is returned by litellm.completion() for non-streaming calls. CustomStreamWrapper is returned for streaming calls. Both are consumed directly by application code.

Code Reference

Source Location:

  • ModelResponse: litellm/types/utils.py, lines 1750-1869
  • CustomStreamWrapper: litellm/litellm_core_utils/streaming_handler.py, lines 69-2247

Signature (ModelResponse):

class ModelResponse(ModelResponseBase):
    choices: List[Union[Choices, StreamingChoices]]

    def __init__(
        self,
        id=None,
        choices=None,
        created=None,
        model=None,
        object=None,
        system_fingerprint=None,
        usage=None,
        stream=None,
        stream_options=None,
        response_ms=None,
        hidden_params=None,
        _response_headers=None,
        **params,
    ) -> None:

Signature (CustomStreamWrapper):

class CustomStreamWrapper:
    def __init__(
        self,
        completion_stream,
        model,
        logging_obj: Any,
        custom_llm_provider: Optional[str] = None,
        stream_options=None,
        make_call: Optional[Callable] = None,
        _response_headers: Optional[dict] = None,
    ):

Signature (ModelResponseBase):

class ModelResponseBase(OpenAIObject):
    id: str
    created: int
    model: Optional[str] = None
    object: str
    system_fingerprint: Optional[str] = None

Import:

from litellm.types.utils import ModelResponse
from litellm.litellm_core_utils.streaming_handler import CustomStreamWrapper

I/O Contract

Inputs (ModelResponse constructor)

Parameter Type Description
id Optional[str] Unique completion identifier. Auto-generated if None.
choices Optional[List[Union[Choices, dict, BaseModel]]] The list of completion choices. Accepts Choices objects, dictionaries, or Pydantic models. Defaults to [Choices()] if None.
created Optional[int] Unix timestamp of creation. Defaults to int(time.time()).
model Optional[str] The model identifier that produced the response.
usage Optional[Union[Usage, dict, BaseModel]] Token usage information. Accepts Usage objects, dictionaries, or Pydantic models. Defaults to Usage() for non-streaming responses.
stream Optional[bool] If True, sets object to "chat.completion.chunk" and uses StreamingChoices. Defaults to "chat.completion" with Choices.
hidden_params Optional[dict] Internal metadata not exposed in the public response.
_response_headers Optional[dict] HTTP response headers from the provider.

Inputs (CustomStreamWrapper constructor)

Parameter Type Description
completion_stream Iterable/AsyncIterable The raw provider streaming response to wrap.
model str The model identifier.
logging_obj Any The LiteLLM logging object for request/response tracking.
custom_llm_provider Optional[str] The resolved provider name.
stream_options Optional[dict] Options such as {"include_usage": True} for stream-level usage reporting.

Outputs

Output Type Description
ModelResponse instance ModelResponse A normalized response with .choices[i].message.content, .usage.prompt_tokens, .usage.completion_tokens, .model, .id, and .created.
CustomStreamWrapper chunks ModelResponseStream Each yielded chunk has .choices[i].delta.content, .choices[i].delta.tool_calls, and .choices[i].finish_reason.

Usage Examples

Accessing a non-streaming response:

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Uniform access regardless of provider
print(response.id)                              # "chatcmpl-abc123..."
print(response.choices[0].message.content)      # "Hi there!"
print(response.choices[0].finish_reason)        # "stop"
print(response.usage.prompt_tokens)             # 10
print(response.usage.completion_tokens)         # 5
print(response.usage.total_tokens)              # 15
print(response.model)                           # "gpt-4"

Consuming a streaming response:

import litellm

stream = litellm.completion(
    model="anthropic/claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="")
    if chunk.choices[0].finish_reason:
        print(f"\nFinish reason: {chunk.choices[0].finish_reason}")

Serializing the response to a dictionary:

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

response_dict = response.model_dump()
# {"id": "chatcmpl-...", "choices": [...], "created": 1700000000, ...}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment