Implementation:BerriAI Litellm Responses API

Property	Value
sources	`litellm/responses/main.py`
domains	Responses, MCP, Streaming, LLM Providers
last_updated	2026-02-15 16:00 GMT

Overview

The Responses API module provides the primary entry point for creating, retrieving, deleting, cancelling, compacting, and listing input items of AI responses across multiple LLM providers, with integrated MCP (Model Context Protocol) tool execution support.

Description

This module implements the OpenAI Responses API surface through LiteLLM, providing sync/async function pairs for all CRUD operations on responses. It uses the @client decorator pattern for automatic logging, error handling, and callback support. The module resolves providers via litellm.get_llm_provider() and delegates HTTP operations to BaseLLMHTTPHandler. When provider-native Responses API support is unavailable, it falls back to a completion-based transformation via LiteLLMCompletionTransformationHandler. A key feature is the MCP integration through aresponses_api_with_mcp, which enables automatic tool discovery, execution, and follow-up calls when MCP tools with server_url="litellm_proxy" are detected.

Usage

Import this module when you need to create or manage AI responses through the OpenAI Responses API protocol. It is the primary interface for response generation, supporting streaming, tool calling, MCP integration, and response lifecycle management (get, delete, cancel, compact).

Code Reference

Source Location

Property	Value
Repository	github.com/BerriAI/litellm
File	`litellm/responses/main.py`
Lines	1619
Module	`litellm.responses.main`

Signature

@client
def responses(
    input: Union[str, ResponseInputParam],
    model: str,
    include: Optional[List[ResponseIncludable]] = None,
    instructions: Optional[str] = None,
    max_output_tokens: Optional[int] = None,
    stream: Optional[bool] = None,
    temperature: Optional[float] = None,
    tools: Optional[Iterable[ToolParam]] = None,
    custom_llm_provider: Optional[str] = None,
    **kwargs,
) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]

@client
async def aresponses(...) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]

@client
def delete_responses(response_id: str, ...) -> DeleteResponseResult

@client
def get_responses(response_id: str, ...) -> ResponsesAPIResponse

@client
def list_input_items(response_id: str, ...) -> Dict

@client
def cancel_responses(response_id: str, ...) -> ResponsesAPIResponse

@client
def compact_responses(input, model, ...) -> ResponsesAPIResponse

Import

from litellm.responses.main import (
    responses, aresponses,
    delete_responses, adelete_responses,
    get_responses, aget_responses,
    list_input_items, alist_input_items,
    cancel_responses, acancel_responses,
    compact_responses, acompact_responses,
    aresponses_api_with_mcp,
)

I/O Contract

Inputs

Parameter	Type	Required	Description
`input`	`Union[str, ResponseInputParam]`	Yes	The user input or structured input for the response
`model`	`str`	Yes	The model identifier (e.g., "openai/gpt-4")
`response_id`	`str`	For get/delete/cancel/list	The encoded response ID (may contain provider and model info)
`instructions`	`Optional[str]`	No	System instructions for the response
`max_output_tokens`	`Optional[int]`	No	Maximum number of output tokens
`stream`	`Optional[bool]`	No	Whether to stream the response
`tools`	`Optional[Iterable[ToolParam]]`	No	Tools available to the model including MCP tools
`custom_llm_provider`	`Optional[str]`	No	Provider override; auto-detected from model if not set
`text_format`	`Optional[Union[Type[BaseModel], dict]]`	No	Pydantic model for structured output format

Outputs

Function	Return Type	Description
`responses`	`ResponsesAPIResponse` or streaming iterator	The generated response or stream
`delete_responses`	`DeleteResponseResult`	Deletion confirmation
`get_responses`	`ResponsesAPIResponse`	The retrieved response object
`list_input_items`	`Dict`	Paginated list of input items
`cancel_responses`	`ResponsesAPIResponse`	The cancelled response object
`compact_responses`	`ResponsesAPIResponse`	The compacted response object

Usage Examples

import litellm

# Simple response creation
response = litellm.responses(
    input="Tell me about AI.",
    model="openai/gpt-4",
)
print(response.output[0].content[0].text)

import litellm

# Streaming response
stream = litellm.responses(
    input="Write a poem.",
    model="openai/gpt-4",
    stream=True,
)
for chunk in stream:
    print(chunk)

import asyncio
import litellm

# Async with MCP tools
async def main():
    response = await litellm.aresponses(
        input="Search for recent news about AI.",
        model="openai/gpt-4",
        tools=[{"type": "mcp", "server_url": "litellm_proxy", "require_approval": "never"}],
    )
    print(response)

asyncio.run(main())

Related Pages

BerriAI_Litellm_Responses_Streaming_Iterator -- Streaming iterator classes used by this module for SSE chunk processing
BerriAI_Litellm_Responses_Utils -- Utility classes for request construction, response ID encoding, and usage transformation
BerriAI_Litellm_MCP_Client -- The MCP client used when MCP tools are detected in the request
BerriAI_Litellm_Assistants_API -- The older Assistants API module for thread-based assistant interactions

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment