Workflow:BerriAI Litellm SDK Completion

Knowledge Sources	LiteLLM LiteLLM Docs
Domains	LLM_Ops, AI_Engineering
Last Updated	2026-02-15 16:00 GMT

Overview

End-to-end process for making LLM API calls across 100+ providers using a single unified Python interface.

Description

This workflow outlines the standard procedure for calling any Large Language Model provider through LiteLLM's unified completion API. The core function litellm.completion() accepts an OpenAI-compatible request format and transparently routes it to any supported provider (OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure, Cohere, and 100+ others). The library handles provider-specific request transformation, response normalization, error mapping, token counting, and cost calculation automatically.

Key outputs:

A normalized ModelResponse object containing the completion text, usage statistics, and cost data
Support for both synchronous and asynchronous execution
Streaming response support via Server-Sent Events

Usage

Execute this workflow when you need to call any LLM provider from Python code and want a consistent OpenAI-compatible interface regardless of the underlying provider. This is the foundational workflow for all LiteLLM usage, whether building applications, running experiments, or integrating LLMs into existing systems.

Execution Steps

Step 1: Environment Configuration

Set provider API keys as environment variables or prepare them as function parameters. Each provider requires its own authentication credentials (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS_ACCESS_KEY_ID). LiteLLM reads these automatically from the environment.

Key considerations:

API keys can be set via environment variables, passed directly, or managed through secret managers
Multiple providers can be configured simultaneously
Some providers (Bedrock, Vertex AI) use IAM-based authentication rather than API keys

Step 2: Model Selection

Choose the target model using LiteLLM's provider-prefixed naming convention. The model string format is provider/model_name (e.g., openai/gpt-4o, anthropic/claude-sonnet-4-20250514, bedrock/anthropic.claude-3-sonnet). LiteLLM uses this prefix to resolve the correct provider handler.

Key considerations:

Provider prefix is required for unambiguous routing
Model pricing and context window limits are looked up from the internal cost map
Custom or self-hosted models can be specified with the custom_llm_provider parameter

Step 3: Request Construction

Build the request using OpenAI-compatible parameters: messages (list of role/content dicts), model, and optional parameters like temperature, max_tokens, tools, response_format, and stream. LiteLLM accepts the full OpenAI parameter set and maps supported parameters to each provider.

Key considerations:

Messages follow the OpenAI chat format with roles: system, user, assistant, tool
Not all parameters are supported by every provider; unsupported params are silently dropped
Function calling and tool use follow the OpenAI tools specification

Step 4: API Dispatch

Call litellm.completion() (sync) or litellm.acompletion() (async). Internally, LiteLLM resolves the provider via get_llm_provider(), transforms the request into the provider-specific format, and dispatches the HTTP call through the appropriate handler.

What happens:

Provider is resolved from the model string
Request is transformed to provider-specific format (e.g., Anthropic Messages API, Bedrock InvokeModel)
HTTP call is made with appropriate headers, auth, and timeout
Pre-call and post-call logging callbacks are triggered

Step 5: Response Normalization

The provider-specific response is transformed back into an OpenAI-compatible ModelResponse object. This includes normalizing the completion text, usage statistics (prompt tokens, completion tokens, total tokens), finish reason, and any tool calls. Cost is calculated based on the model's pricing data.

Key considerations:

All providers return the same ModelResponse structure regardless of native format
Token usage is counted and standardized across providers
Cost is calculated automatically using the internal pricing database
Streaming responses are wrapped in a unified CustomStreamWrapper

Step 6: Error Handling

If the provider returns an error, LiteLLM maps it to the corresponding OpenAI-compatible exception class (e.g., RateLimitError, AuthenticationError, ContextWindowExceededError). This allows applications to use a single error handling strategy regardless of provider.

Key considerations:

Provider-specific HTTP errors are mapped to OpenAI exception types
Timeout handling is unified across providers
Retry logic can be configured via num_retries parameter
Fallback models can be specified via fallbacks parameter

Execution Diagram

GitHub URL

Workflow Repository