Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:BerriAI Litellm Responses API

From Leeroopedia
Property Value
sources litellm/responses/main.py
domains Responses, MCP, Streaming, LLM Providers
last_updated 2026-02-15 16:00 GMT

Overview

The Responses API module provides the primary entry point for creating, retrieving, deleting, cancelling, compacting, and listing input items of AI responses across multiple LLM providers, with integrated MCP (Model Context Protocol) tool execution support.

Description

This module implements the OpenAI Responses API surface through LiteLLM, providing sync/async function pairs for all CRUD operations on responses. It uses the @client decorator pattern for automatic logging, error handling, and callback support. The module resolves providers via litellm.get_llm_provider() and delegates HTTP operations to BaseLLMHTTPHandler. When provider-native Responses API support is unavailable, it falls back to a completion-based transformation via LiteLLMCompletionTransformationHandler. A key feature is the MCP integration through aresponses_api_with_mcp, which enables automatic tool discovery, execution, and follow-up calls when MCP tools with server_url="litellm_proxy" are detected.

Usage

Import this module when you need to create or manage AI responses through the OpenAI Responses API protocol. It is the primary interface for response generation, supporting streaming, tool calling, MCP integration, and response lifecycle management (get, delete, cancel, compact).

Code Reference

Source Location

Property Value
Repository github.com/BerriAI/litellm
File litellm/responses/main.py
Lines 1619
Module litellm.responses.main

Signature

@client
def responses(
    input: Union[str, ResponseInputParam],
    model: str,
    include: Optional[List[ResponseIncludable]] = None,
    instructions: Optional[str] = None,
    max_output_tokens: Optional[int] = None,
    stream: Optional[bool] = None,
    temperature: Optional[float] = None,
    tools: Optional[Iterable[ToolParam]] = None,
    custom_llm_provider: Optional[str] = None,
    **kwargs,
) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]

@client
async def aresponses(...) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]

@client
def delete_responses(response_id: str, ...) -> DeleteResponseResult

@client
def get_responses(response_id: str, ...) -> ResponsesAPIResponse

@client
def list_input_items(response_id: str, ...) -> Dict

@client
def cancel_responses(response_id: str, ...) -> ResponsesAPIResponse

@client
def compact_responses(input, model, ...) -> ResponsesAPIResponse

Import

from litellm.responses.main import (
    responses, aresponses,
    delete_responses, adelete_responses,
    get_responses, aget_responses,
    list_input_items, alist_input_items,
    cancel_responses, acancel_responses,
    compact_responses, acompact_responses,
    aresponses_api_with_mcp,
)

I/O Contract

Inputs

Parameter Type Required Description
input Union[str, ResponseInputParam] Yes The user input or structured input for the response
model str Yes The model identifier (e.g., "openai/gpt-4")
response_id str For get/delete/cancel/list The encoded response ID (may contain provider and model info)
instructions Optional[str] No System instructions for the response
max_output_tokens Optional[int] No Maximum number of output tokens
stream Optional[bool] No Whether to stream the response
tools Optional[Iterable[ToolParam]] No Tools available to the model including MCP tools
custom_llm_provider Optional[str] No Provider override; auto-detected from model if not set
text_format Optional[Union[Type[BaseModel], dict]] No Pydantic model for structured output format

Outputs

Function Return Type Description
responses ResponsesAPIResponse or streaming iterator The generated response or stream
delete_responses DeleteResponseResult Deletion confirmation
get_responses ResponsesAPIResponse The retrieved response object
list_input_items Dict Paginated list of input items
cancel_responses ResponsesAPIResponse The cancelled response object
compact_responses ResponsesAPIResponse The compacted response object

Usage Examples

import litellm

# Simple response creation
response = litellm.responses(
    input="Tell me about AI.",
    model="openai/gpt-4",
)
print(response.output[0].content[0].text)
import litellm

# Streaming response
stream = litellm.responses(
    input="Write a poem.",
    model="openai/gpt-4",
    stream=True,
)
for chunk in stream:
    print(chunk)
import asyncio
import litellm

# Async with MCP tools
async def main():
    response = await litellm.aresponses(
        input="Search for recent news about AI.",
        model="openai/gpt-4",
        tools=[{"type": "mcp", "server_url": "litellm_proxy", "require_approval": "never"}],
    )
    print(response)

asyncio.run(main())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment