Implementation:BerriAI Litellm Completion
| Knowledge Sources | BerriAI/litellm - litellm/main.py |
|---|---|
| Domains | LLM Integration, API Routing, Provider Abstraction |
| Last Updated | 2026-02-15 |
Overview
Concrete tool for dispatching LLM chat completion calls through a unified interface, provided by the litellm Python package via the completion() and acompletion() functions in litellm/main.py.
Description
The completion() function is the primary entry point for all synchronous LLM chat completion calls in LiteLLM. It accepts an OpenAI-compatible parameter set, resolves the target provider using get_llm_provider(), transforms the request for the specific provider, invokes the provider handler, normalizes the response into a ModelResponse (or CustomStreamWrapper for streaming), and applies cross-cutting concerns including logging, caching, and callback invocation. The acompletion() function provides the equivalent asynchronous path using async/await.
Together, these functions support 100+ LLM providers through a single function signature, handling parameter transformation, authentication, streaming, function calling, tool use, and error mapping transparently.
Usage
Call litellm.completion() for synchronous completions or litellm.acompletion() for asynchronous completions. These are typically the only functions end users interact with directly.
Code Reference
Source Location: litellm/main.py, lines 999-7447 (sync completion), lines 371-998 (async acompletion)
Signature (completion):
def completion(
model: str,
messages: List = [],
timeout: Optional[Union[float, str, httpx.Timeout]] = None,
temperature: Optional[float] = None,
top_p: Optional[float] = None,
n: Optional[int] = None,
stream: Optional[bool] = None,
stream_options: Optional[dict] = None,
stop=None,
max_completion_tokens: Optional[int] = None,
max_tokens: Optional[int] = None,
modalities: Optional[List[ChatCompletionModality]] = None,
prediction: Optional[ChatCompletionPredictionContentParam] = None,
audio: Optional[ChatCompletionAudioParam] = None,
presence_penalty: Optional[float] = None,
frequency_penalty: Optional[float] = None,
logit_bias: Optional[dict] = None,
user: Optional[str] = None,
reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
response_format: Optional[Union[dict, Type[BaseModel]]] = None,
seed: Optional[int] = None,
tools: Optional[List] = None,
tool_choice: Optional[Union[str, dict]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
parallel_tool_calls: Optional[bool] = None,
deployment_id=None,
extra_headers: Optional[dict] = None,
functions: Optional[List] = None,
function_call: Optional[str] = None,
base_url: Optional[str] = None,
api_version: Optional[str] = None,
api_key: Optional[str] = None,
model_list: Optional[list] = None,
thinking: Optional[AnthropicThinkingParam] = None,
**kwargs,
) -> Union[ModelResponse, CustomStreamWrapper]:
Signature (acompletion):
async def acompletion(
model: str,
messages: List = [],
functions: Optional[List] = None,
function_call: Optional[str] = None,
timeout: Optional[Union[float, int]] = None,
temperature: Optional[float] = None,
top_p: Optional[float] = None,
n: Optional[int] = None,
stream: Optional[bool] = None,
stream_options: Optional[dict] = None,
stop=None,
max_tokens: Optional[int] = None,
max_completion_tokens: Optional[int] = None,
modalities: Optional[List[ChatCompletionModality]] = None,
prediction: Optional[ChatCompletionPredictionContentParam] = None,
audio: Optional[ChatCompletionAudioParam] = None,
presence_penalty: Optional[float] = None,
frequency_penalty: Optional[float] = None,
logit_bias: Optional[dict] = None,
user: Optional[str] = None,
response_format: Optional[Union[dict, Type[BaseModel]]] = None,
seed: Optional[int] = None,
tools: Optional[List] = None,
tool_choice: Optional[Union[str, dict]] = None,
parallel_tool_calls: Optional[bool] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
deployment_id=None,
reasoning_effort: Optional[Literal["none", "minimal", "low", "medium", "high", "xhigh", "default"]] = None,
base_url: Optional[str] = None,
api_version: Optional[str] = None,
api_key: Optional[str] = None,
model_list: Optional[list] = None,
**kwargs,
) -> Union[ModelResponse, CustomStreamWrapper]:
Import:
import litellm
# or
from litellm import completion, acompletion
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
model |
str |
Required. The model identifier, optionally with a provider prefix (e.g., "gpt-4", "anthropic/claude-3-opus-20240229", "azure/my-deployment").
|
messages |
List |
Required. A list of message dictionaries following the OpenAI Chat Completions message format. |
temperature |
Optional[float] |
Controls randomness of the output. Values between 0.0 and 2.0. |
max_tokens |
Optional[int] |
Maximum number of tokens in the generated completion. |
stream |
Optional[bool] |
If True, returns a CustomStreamWrapper that yields chunks.
|
tools |
Optional[List] |
List of tool definitions for function calling. |
tool_choice |
Optional[Union[str, dict]] |
Controls which tool the model calls ("auto", "none", or a specific tool).
|
response_format |
Optional[Union[dict, Type[BaseModel]]] |
Constrains the output format (e.g., JSON mode or structured output schema). |
api_key |
Optional[str] |
Per-call API key override. |
base_url |
Optional[str] |
Per-call API base URL override. |
timeout |
Optional[Union[float, str, httpx.Timeout]] |
Request timeout in seconds. |
**kwargs |
Any |
Additional provider-specific parameters passed through to the handler. |
Outputs
| Output | Type | Description |
|---|---|---|
| Non-streaming response | ModelResponse |
An OpenAI-compatible response object with choices, usage, model, id, and created fields.
|
| Streaming response | CustomStreamWrapper |
An iterable/async-iterable that yields ModelResponseStream chunk objects.
|
Usage Examples
Basic synchronous completion:
import litellm
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "What is machine learning?"}],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)
Async completion:
import asyncio
import litellm
async def main():
response = await litellm.acompletion(
model="anthropic/claude-3-opus-20240229",
messages=[{"role": "user", "content": "Explain quantum computing."}],
)
print(response.choices[0].message.content)
asyncio.run(main())
Streaming completion:
import litellm
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Write a short poem."}],
stream=True,
)
for chunk in response:
content = chunk.choices[0].delta.content
if content:
print(content, end="")
Cross-provider usage:
import litellm
# Same function, different providers
for model in ["gpt-4", "anthropic/claude-3-opus-20240229", "azure/my-gpt4-deployment"]:
response = litellm.completion(
model=model,
messages=[{"role": "user", "content": "Hello!"}],
)
print(f"{model}: {response.choices[0].message.content}")