Implementation:BerriAI Litellm Model Response
| Knowledge Sources | BerriAI/litellm - litellm/types/utils.py, BerriAI/litellm - litellm/litellm_core_utils/streaming_handler.py |
|---|---|
| Domains | LLM Integration, Response Processing, Data Transformation |
| Last Updated | 2026-02-15 |
Overview
Concrete tool for normalizing diverse LLM provider response formats into a unified OpenAI-compatible structure, provided by the litellm Python package via the ModelResponse class in litellm/types/utils.py and the CustomStreamWrapper class in litellm/litellm_core_utils/streaming_handler.py.
Description
The ModelResponse class is a Pydantic model that represents a normalized, non-streaming chat completion response. It inherits from ModelResponseBase (which extends OpenAIObject) and contains fields for id, choices, created, model, object, system_fingerprint, and usage. Its constructor handles flexible input types: choices can be Choices objects, dictionaries, or Pydantic models, and usage can be a Usage object or a dictionary. Missing fields are populated with sensible defaults (auto-generated IDs, current timestamps, empty usage).
The CustomStreamWrapper class wraps provider-specific streaming responses into a uniform iterable/async-iterable that yields ModelResponseStream chunks. It handles chunk normalization, safety checks (detecting infinite loops of repeated chunks), and usage tracking for streaming responses.
Usage
ModelResponse is returned by litellm.completion() for non-streaming calls. CustomStreamWrapper is returned for streaming calls. Both are consumed directly by application code.
Code Reference
Source Location:
ModelResponse:litellm/types/utils.py, lines 1750-1869CustomStreamWrapper:litellm/litellm_core_utils/streaming_handler.py, lines 69-2247
Signature (ModelResponse):
class ModelResponse(ModelResponseBase):
choices: List[Union[Choices, StreamingChoices]]
def __init__(
self,
id=None,
choices=None,
created=None,
model=None,
object=None,
system_fingerprint=None,
usage=None,
stream=None,
stream_options=None,
response_ms=None,
hidden_params=None,
_response_headers=None,
**params,
) -> None:
Signature (CustomStreamWrapper):
class CustomStreamWrapper:
def __init__(
self,
completion_stream,
model,
logging_obj: Any,
custom_llm_provider: Optional[str] = None,
stream_options=None,
make_call: Optional[Callable] = None,
_response_headers: Optional[dict] = None,
):
Signature (ModelResponseBase):
class ModelResponseBase(OpenAIObject):
id: str
created: int
model: Optional[str] = None
object: str
system_fingerprint: Optional[str] = None
Import:
from litellm.types.utils import ModelResponse
from litellm.litellm_core_utils.streaming_handler import CustomStreamWrapper
I/O Contract
Inputs (ModelResponse constructor)
| Parameter | Type | Description |
|---|---|---|
id |
Optional[str] |
Unique completion identifier. Auto-generated if None.
|
choices |
Optional[List[Union[Choices, dict, BaseModel]]] |
The list of completion choices. Accepts Choices objects, dictionaries, or Pydantic models. Defaults to [Choices()] if None.
|
created |
Optional[int] |
Unix timestamp of creation. Defaults to int(time.time()).
|
model |
Optional[str] |
The model identifier that produced the response. |
usage |
Optional[Union[Usage, dict, BaseModel]] |
Token usage information. Accepts Usage objects, dictionaries, or Pydantic models. Defaults to Usage() for non-streaming responses.
|
stream |
Optional[bool] |
If True, sets object to "chat.completion.chunk" and uses StreamingChoices. Defaults to "chat.completion" with Choices.
|
hidden_params |
Optional[dict] |
Internal metadata not exposed in the public response. |
_response_headers |
Optional[dict] |
HTTP response headers from the provider. |
Inputs (CustomStreamWrapper constructor)
| Parameter | Type | Description |
|---|---|---|
completion_stream |
Iterable/AsyncIterable | The raw provider streaming response to wrap. |
model |
str |
The model identifier. |
logging_obj |
Any |
The LiteLLM logging object for request/response tracking. |
custom_llm_provider |
Optional[str] |
The resolved provider name. |
stream_options |
Optional[dict] |
Options such as {"include_usage": True} for stream-level usage reporting.
|
Outputs
| Output | Type | Description |
|---|---|---|
ModelResponse instance |
ModelResponse |
A normalized response with .choices[i].message.content, .usage.prompt_tokens, .usage.completion_tokens, .model, .id, and .created.
|
CustomStreamWrapper chunks |
ModelResponseStream |
Each yielded chunk has .choices[i].delta.content, .choices[i].delta.tool_calls, and .choices[i].finish_reason.
|
Usage Examples
Accessing a non-streaming response:
import litellm
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
)
# Uniform access regardless of provider
print(response.id) # "chatcmpl-abc123..."
print(response.choices[0].message.content) # "Hi there!"
print(response.choices[0].finish_reason) # "stop"
print(response.usage.prompt_tokens) # 10
print(response.usage.completion_tokens) # 5
print(response.usage.total_tokens) # 15
print(response.model) # "gpt-4"
Consuming a streaming response:
import litellm
stream = litellm.completion(
model="anthropic/claude-3-opus-20240229",
messages=[{"role": "user", "content": "Write a haiku."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="")
if chunk.choices[0].finish_reason:
print(f"\nFinish reason: {chunk.choices[0].finish_reason}")
Serializing the response to a dictionary:
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
)
response_dict = response.model_dump()
# {"id": "chatcmpl-...", "choices": [...], "created": 1700000000, ...}