Workflow:BerriAI Litellm SDK Completion
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, AI_Engineering |
| Last Updated | 2026-02-15 16:00 GMT |
Overview
End-to-end process for making LLM API calls across 100+ providers using a single unified Python interface.
Description
This workflow outlines the standard procedure for calling any Large Language Model provider through LiteLLM's unified completion API. The core function litellm.completion() accepts an OpenAI-compatible request format and transparently routes it to any supported provider (OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure, Cohere, and 100+ others). The library handles provider-specific request transformation, response normalization, error mapping, token counting, and cost calculation automatically.
Key outputs:
- A normalized
ModelResponseobject containing the completion text, usage statistics, and cost data - Support for both synchronous and asynchronous execution
- Streaming response support via Server-Sent Events
Usage
Execute this workflow when you need to call any LLM provider from Python code and want a consistent OpenAI-compatible interface regardless of the underlying provider. This is the foundational workflow for all LiteLLM usage, whether building applications, running experiments, or integrating LLMs into existing systems.
Execution Steps
Step 1: Environment Configuration
Set provider API keys as environment variables or prepare them as function parameters. Each provider requires its own authentication credentials (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS_ACCESS_KEY_ID). LiteLLM reads these automatically from the environment.
Key considerations:
- API keys can be set via environment variables, passed directly, or managed through secret managers
- Multiple providers can be configured simultaneously
- Some providers (Bedrock, Vertex AI) use IAM-based authentication rather than API keys
Step 2: Model Selection
Choose the target model using LiteLLM's provider-prefixed naming convention. The model string format is provider/model_name (e.g., openai/gpt-4o, anthropic/claude-sonnet-4-20250514, bedrock/anthropic.claude-3-sonnet). LiteLLM uses this prefix to resolve the correct provider handler.
Key considerations:
- Provider prefix is required for unambiguous routing
- Model pricing and context window limits are looked up from the internal cost map
- Custom or self-hosted models can be specified with the
custom_llm_providerparameter
Step 3: Request Construction
Build the request using OpenAI-compatible parameters: messages (list of role/content dicts), model, and optional parameters like temperature, max_tokens, tools, response_format, and stream. LiteLLM accepts the full OpenAI parameter set and maps supported parameters to each provider.
Key considerations:
- Messages follow the OpenAI chat format with roles: system, user, assistant, tool
- Not all parameters are supported by every provider; unsupported params are silently dropped
- Function calling and tool use follow the OpenAI tools specification
Step 4: API Dispatch
Call litellm.completion() (sync) or litellm.acompletion() (async). Internally, LiteLLM resolves the provider via get_llm_provider(), transforms the request into the provider-specific format, and dispatches the HTTP call through the appropriate handler.
What happens:
- Provider is resolved from the model string
- Request is transformed to provider-specific format (e.g., Anthropic Messages API, Bedrock InvokeModel)
- HTTP call is made with appropriate headers, auth, and timeout
- Pre-call and post-call logging callbacks are triggered
Step 5: Response Normalization
The provider-specific response is transformed back into an OpenAI-compatible ModelResponse object. This includes normalizing the completion text, usage statistics (prompt tokens, completion tokens, total tokens), finish reason, and any tool calls. Cost is calculated based on the model's pricing data.
Key considerations:
- All providers return the same
ModelResponsestructure regardless of native format - Token usage is counted and standardized across providers
- Cost is calculated automatically using the internal pricing database
- Streaming responses are wrapped in a unified
CustomStreamWrapper
Step 6: Error Handling
If the provider returns an error, LiteLLM maps it to the corresponding OpenAI-compatible exception class (e.g., RateLimitError, AuthenticationError, ContextWindowExceededError). This allows applications to use a single error handling strategy regardless of provider.
Key considerations:
- Provider-specific HTTP errors are mapped to OpenAI exception types
- Timeout handling is unified across providers
- Retry logic can be configured via
num_retriesparameter - Fallback models can be specified via
fallbacksparameter