| Property |
Value
|
| sources |
litellm/responses/main.py
|
| domains |
Responses, MCP, Streaming, LLM Providers
|
| last_updated |
2026-02-15 16:00 GMT
|
Overview
The Responses API module provides the primary entry point for creating, retrieving, deleting, cancelling, compacting, and listing input items of AI responses across multiple LLM providers, with integrated MCP (Model Context Protocol) tool execution support.
Description
This module implements the OpenAI Responses API surface through LiteLLM, providing sync/async function pairs for all CRUD operations on responses. It uses the @client decorator pattern for automatic logging, error handling, and callback support. The module resolves providers via litellm.get_llm_provider() and delegates HTTP operations to BaseLLMHTTPHandler. When provider-native Responses API support is unavailable, it falls back to a completion-based transformation via LiteLLMCompletionTransformationHandler. A key feature is the MCP integration through aresponses_api_with_mcp, which enables automatic tool discovery, execution, and follow-up calls when MCP tools with server_url="litellm_proxy" are detected.
Usage
Import this module when you need to create or manage AI responses through the OpenAI Responses API protocol. It is the primary interface for response generation, supporting streaming, tool calling, MCP integration, and response lifecycle management (get, delete, cancel, compact).
Code Reference
Source Location
Signature
@client
def responses(
input: Union[str, ResponseInputParam],
model: str,
include: Optional[List[ResponseIncludable]] = None,
instructions: Optional[str] = None,
max_output_tokens: Optional[int] = None,
stream: Optional[bool] = None,
temperature: Optional[float] = None,
tools: Optional[Iterable[ToolParam]] = None,
custom_llm_provider: Optional[str] = None,
**kwargs,
) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]
@client
async def aresponses(...) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]
@client
def delete_responses(response_id: str, ...) -> DeleteResponseResult
@client
def get_responses(response_id: str, ...) -> ResponsesAPIResponse
@client
def list_input_items(response_id: str, ...) -> Dict
@client
def cancel_responses(response_id: str, ...) -> ResponsesAPIResponse
@client
def compact_responses(input, model, ...) -> ResponsesAPIResponse
Import
from litellm.responses.main import (
responses, aresponses,
delete_responses, adelete_responses,
get_responses, aget_responses,
list_input_items, alist_input_items,
cancel_responses, acancel_responses,
compact_responses, acompact_responses,
aresponses_api_with_mcp,
)
I/O Contract
Inputs
| Parameter |
Type |
Required |
Description
|
input |
Union[str, ResponseInputParam] |
Yes |
The user input or structured input for the response
|
model |
str |
Yes |
The model identifier (e.g., "openai/gpt-4")
|
response_id |
str |
For get/delete/cancel/list |
The encoded response ID (may contain provider and model info)
|
instructions |
Optional[str] |
No |
System instructions for the response
|
max_output_tokens |
Optional[int] |
No |
Maximum number of output tokens
|
stream |
Optional[bool] |
No |
Whether to stream the response
|
tools |
Optional[Iterable[ToolParam]] |
No |
Tools available to the model including MCP tools
|
custom_llm_provider |
Optional[str] |
No |
Provider override; auto-detected from model if not set
|
text_format |
Optional[Union[Type[BaseModel], dict]] |
No |
Pydantic model for structured output format
|
Outputs
| Function |
Return Type |
Description
|
responses |
ResponsesAPIResponse or streaming iterator |
The generated response or stream
|
delete_responses |
DeleteResponseResult |
Deletion confirmation
|
get_responses |
ResponsesAPIResponse |
The retrieved response object
|
list_input_items |
Dict |
Paginated list of input items
|
cancel_responses |
ResponsesAPIResponse |
The cancelled response object
|
compact_responses |
ResponsesAPIResponse |
The compacted response object
|
Usage Examples
import litellm
# Simple response creation
response = litellm.responses(
input="Tell me about AI.",
model="openai/gpt-4",
)
print(response.output[0].content[0].text)
import litellm
# Streaming response
stream = litellm.responses(
input="Write a poem.",
model="openai/gpt-4",
stream=True,
)
for chunk in stream:
print(chunk)
import asyncio
import litellm
# Async with MCP tools
async def main():
response = await litellm.aresponses(
input="Search for recent news about AI.",
model="openai/gpt-4",
tools=[{"type": "mcp", "server_url": "litellm_proxy", "require_approval": "never"}],
)
print(response)
asyncio.run(main())
Related Pages