Implementation:BerriAI Litellm Completion

Knowledge Sources	BerriAI/litellm - litellm/main.py
Domains	LLM Integration, API Routing, Provider Abstraction
Last Updated	2026-02-15

Overview

Concrete tool for dispatching LLM chat completion calls through a unified interface, provided by the litellm Python package via the completion() and acompletion() functions in litellm/main.py.

Description

The completion() function is the primary entry point for all synchronous LLM chat completion calls in LiteLLM. It accepts an OpenAI-compatible parameter set, resolves the target provider using get_llm_provider(), transforms the request for the specific provider, invokes the provider handler, normalizes the response into a ModelResponse (or CustomStreamWrapper for streaming), and applies cross-cutting concerns including logging, caching, and callback invocation. The acompletion() function provides the equivalent asynchronous path using async/await.

Together, these functions support 100+ LLM providers through a single function signature, handling parameter transformation, authentication, streaming, function calling, tool use, and error mapping transparently.

Usage

Call litellm.completion() for synchronous completions or litellm.acompletion() for asynchronous completions. These are typically the only functions end users interact with directly.

Code Reference

Source Location: litellm/main.py, lines 999-7447 (sync completion), lines 371-998 (async acompletion)

Signature (completion):

def completion(
    model: str,
    messages: List = [],
    timeout: Optional[Union[float, str, httpx.Timeout]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop=None,
    max_completion_tokens: Optional[int] = None,
    max_tokens: Optional[int] = None,
    modalities: Optional[List[ChatCompletionModality]] = None,
    prediction: Optional[ChatCompletionPredictionContentParam] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
    response_format: Optional[Union[dict, Type[BaseModel]]] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[Union[str, dict]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    parallel_tool_calls: Optional[bool] = None,
    deployment_id=None,
    extra_headers: Optional[dict] = None,
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    model_list: Optional[list] = None,
    thinking: Optional[AnthropicThinkingParam] = None,
    **kwargs,
) -> Union[ModelResponse, CustomStreamWrapper]:

Signature (acompletion):

async def acompletion(
    model: str,
    messages: List = [],
    functions: Optional[List] = None,
    function_call: Optional[str] = None,
    timeout: Optional[Union[float, int]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stream_options: Optional[dict] = None,
    stop=None,
    max_tokens: Optional[int] = None,
    max_completion_tokens: Optional[int] = None,
    modalities: Optional[List[ChatCompletionModality]] = None,
    prediction: Optional[ChatCompletionPredictionContentParam] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
    user: Optional[str] = None,
    response_format: Optional[Union[dict, Type[BaseModel]]] = None,
    seed: Optional[int] = None,
    tools: Optional[List] = None,
    tool_choice: Optional[Union[str, dict]] = None,
    parallel_tool_calls: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    deployment_id=None,
    reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
    base_url: Optional[str] = None,
    api_version: Optional[str] = None,
    api_key: Optional[str] = None,
    model_list: Optional[list] = None,
    **kwargs,
) -> Union[ModelResponse, CustomStreamWrapper]:

Import:

import litellm
# or
from litellm import completion, acompletion

I/O Contract

Inputs

Parameter	Type	Description
`model`	`str`	Required. The model identifier, optionally with a provider prefix (e.g., `"gpt-4"`, `"anthropic/claude-3-opus-20240229"`, `"azure/my-deployment"`).
`messages`	`List`	Required. A list of message dictionaries following the OpenAI Chat Completions message format.
`temperature`	`Optional[float]`	Controls randomness of the output. Values between 0.0 and 2.0.
`max_tokens`	`Optional[int]`	Maximum number of tokens in the generated completion.
`stream`	`Optional[bool]`	If `True`, returns a `CustomStreamWrapper` that yields chunks.
`tools`	`Optional[List]`	List of tool definitions for function calling.
`tool_choice`	`Optional[Union[str, dict]]`	Controls which tool the model calls (`"auto"`, `"none"`, or a specific tool).
`response_format`	`Optional[Union[dict, Type[BaseModel]]]`	Constrains the output format (e.g., JSON mode or structured output schema).
`api_key`	`Optional[str]`	Per-call API key override.
`base_url`	`Optional[str]`	Per-call API base URL override.
`timeout`	`Optional[Union[float, str, httpx.Timeout]]`	Request timeout in seconds.
`**kwargs`	`Any`	Additional provider-specific parameters passed through to the handler.

Outputs

Output	Type	Description
Non-streaming response	`ModelResponse`	An OpenAI-compatible response object with `choices`, `usage`, `model`, `id`, and `created` fields.
Streaming response	`CustomStreamWrapper`	An iterable/async-iterable that yields `ModelResponseStream` chunk objects.

Usage Examples

Basic synchronous completion:

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    temperature=0.7,
    max_tokens=256,
)
print(response.choices[0].message.content)

Async completion:

import asyncio
import litellm

async def main():
    response = await litellm.acompletion(
        model="anthropic/claude-3-opus-20240229",
        messages=[{"role": "user", "content": "Explain quantum computing."}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Streaming completion:

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True,
)
for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")

Cross-provider usage:

import litellm

# Same function, different providers
for model in ["gpt-4", "anthropic/claude-3-opus-20240229", "azure/my-gpt4-deployment"]:
    response = litellm.completion(
        model=model,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(f"{model}: {response.choices[0].message.content}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment