Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:BerriAI Litellm Completion

From Leeroopedia
Knowledge Sources BerriAI/litellm - litellm/main.py
Domains LLM Integration, API Routing, Provider Abstraction
Last Updated 2026-02-15

Overview

Concrete tool for dispatching LLM chat completion calls through a unified interface, provided by the litellm Python package via the completion() and acompletion() functions in litellm/main.py.

Description

The completion() function is the primary entry point for all synchronous LLM chat completion calls in LiteLLM. It accepts an OpenAI-compatible parameter set, resolves the target provider using get_llm_provider(), transforms the request for the specific provider, invokes the provider handler, normalizes the response into a ModelResponse (or CustomStreamWrapper for streaming), and applies cross-cutting concerns including logging, caching, and callback invocation. The acompletion() function provides the equivalent asynchronous path using async/await.

Together, these functions support 100+ LLM providers through a single function signature, handling parameter transformation, authentication, streaming, function calling, tool use, and error mapping transparently.

Usage

Call litellm.completion() for synchronous completions or litellm.acompletion() for asynchronous completions. These are typically the only functions end users interact with directly.

Code Reference

Source Location: litellm/main.py, lines 999-7447 (sync completion), lines 371-998 (async acompletion)

Signature (completion):

def completion(
    model: str,
    messages: List = [],
    timeout: Optional[Union[float, str, httpx.Timeout]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop=None,
    max_completion_tokens: Optional[int] = None,
    max_tokens: Optional[int] = None,
    modalities: Optional[List[ChatCompletionModality]] = None,
    prediction: Optional[ChatCompletionPredictionContentParam] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
    response_format: Optional[Union[dict, Type[BaseModel]]] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[Union[str, dict]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    parallel_tool_calls: Optional[bool] = None,
    deployment_id=None,
    extra_headers: Optional[dict] = None,
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    model_list: Optional[list] = None,
    thinking: Optional[AnthropicThinkingParam] = None,
    **kwargs,
) -> Union[ModelResponse, CustomStreamWrapper]:

Signature (acompletion):

async def acompletion(
    model: str,
    messages: List = [],
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    timeout: Optional[Union[float, int]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop=None,
    max_tokens: Optional[int] = None,
    max_completion_tokens: Optional[int] = None,
    modalities: Optional[List[ChatCompletionModality]] = None,
    prediction: Optional[ChatCompletionPredictionContentParam] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    response_format: Optional[Union[dict, Type[BaseModel]]] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[Union[str, dict]] = None,
    parallel_tool_calls: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    deployment_id=None,
    reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    model_list: Optional[list] = None,
    **kwargs,
) -> Union[ModelResponse, CustomStreamWrapper]:

Import:

import litellm
# or
from litellm import completion, acompletion

I/O Contract

Inputs

Parameter Type Description
model str Required. The model identifier, optionally with a provider prefix (e.g., "gpt-4", "anthropic/claude-3-opus-20240229", "azure/my-deployment").
messages List Required. A list of message dictionaries following the OpenAI Chat Completions message format.
temperature Optional[float] Controls randomness of the output. Values between 0.0 and 2.0.
max_tokens Optional[int] Maximum number of tokens in the generated completion.
stream Optional[bool] If True, returns a CustomStreamWrapper that yields chunks.
tools Optional[List] List of tool definitions for function calling.
tool_choice Optional[Union[str, dict]] Controls which tool the model calls ("auto", "none", or a specific tool).
response_format Optional[Union[dict, Type[BaseModel]]] Constrains the output format (e.g., JSON mode or structured output schema).
api_key Optional[str] Per-call API key override.
base_url Optional[str] Per-call API base URL override.
timeout Optional[Union[float, str, httpx.Timeout]] Request timeout in seconds.
**kwargs Any Additional provider-specific parameters passed through to the handler.

Outputs

Output Type Description
Non-streaming response ModelResponse An OpenAI-compatible response object with choices, usage, model, id, and created fields.
Streaming response CustomStreamWrapper An iterable/async-iterable that yields ModelResponseStream chunk objects.

Usage Examples

Basic synchronous completion:

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    temperature=0.7,
    max_tokens=256,
)
print(response.choices[0].message.content)

Async completion:

import asyncio
import litellm

async def main():
    response = await litellm.acompletion(
        model="anthropic/claude-3-opus-20240229",
        messages=[{"role": "user", "content": "Explain quantum computing."}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Streaming completion:

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True,
)
for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")

Cross-provider usage:

import litellm

# Same function, different providers
for model in ["gpt-4", "anthropic/claude-3-opus-20240229", "azure/my-gpt4-deployment"]:
    response = litellm.completion(
        model=model,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(f"{model}: {response.choices[0].message.content}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment