Principle:BerriAI Litellm Fine Tuned Model Usage
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| Transfer Learning Deployment, Model Serving Patterns, Unified LLM Interface Design | Machine Learning Operations, API Design, Model Deployment | 2026-02-15 |
Overview
Fine-tuned model usage is the practice of invoking a customized model through the same unified completion interface used for base models, using provider-specific model identifiers obtained from successful fine-tuning jobs.
Description
After a fine-tuning job completes successfully, the provider assigns a unique model identifier to the resulting fine-tuned model (e.g., ft:gpt-3.5-turbo:my-org:custom-suffix:abc123 for OpenAI). This identifier can then be used in place of a standard model name when making completion requests. The fundamental design principle is that fine-tuned models are transparent replacements for base models: the caller uses the exact same completion interface, with the only difference being the model identifier string.
This transparency provides several advantages:
- Zero code changes: Existing application code that calls the completion API works with fine-tuned models by simply changing the model parameter.
- Unified interface: The same function handles routing, authentication, streaming, function calling, and all other features regardless of whether the model is a base model or a fine-tuned variant.
- Provider abstraction: The caller specifies the provider prefix (e.g.,
openai/,azure/) followed by the fine-tuned model identifier, and the framework handles all provider-specific details.
The key challenge is correctly identifying the provider for a fine-tuned model identifier, since fine-tuned model names may follow provider-specific naming conventions that differ from standard model names.
Usage
Fine-tuned model usage applies when:
- A fine-tuning job has completed successfully and produced a model identifier.
- The fine-tuned model needs to be integrated into existing application workflows.
- Multiple fine-tuned models need to be compared against each other or against base models.
- A production system needs to switch between base and fine-tuned models based on configuration.
- Load balancing or fallback logic needs to include fine-tuned models alongside base models.
Theoretical Basis
Model Identifier Resolution
When a completion request is made with a fine-tuned model identifier, the framework must resolve which provider to route the request to. This resolution follows a pattern:
FUNCTION resolve_model_provider(model_identifier):
IF model_identifier contains explicit provider prefix (e.g., "openai/ft:..."):
EXTRACT provider from prefix
EXTRACT model_name from remainder
ELSE IF model_identifier matches known fine-tuned naming pattern:
OpenAI: starts with "ft:" or "ft-"
Azure: deployment name format
Vertex AI: tuned model resource path
INFER provider from pattern
ELSE:
FALL BACK to default provider resolution logic
RETURN (provider, model_name)
Completion Flow with Fine-Tuned Models
The completion flow is identical to base model usage, with the fine-tuned model identifier simply substituted:
FUNCTION completion_with_fine_tuned_model(model_id, messages, params):
1. RESOLVE provider from model_id
2. RESOLVE credentials for provider
3. TRANSFORM messages to provider-specific format
4. SEND request to provider completion endpoint with model_id
5. RECEIVE response
6. NORMALIZE response to unified format (ModelResponse)
7. RETURN response
NOTE: Steps 1-7 are identical to base model completion.
The only difference is the value of model_id.
Provider-Specific Model Identifiers
Each provider uses a different naming convention for fine-tuned models:
| Provider | Fine-Tuned Model Identifier Pattern | Example |
|---|---|---|
| OpenAI | ft:{base_model}:{org}:{suffix}:{id} |
ft:gpt-3.5-turbo:my-org:custom:abc123
|
| Azure OpenAI | Deployment name (user-defined) | my-fine-tuned-deployment
|
| Vertex AI | Tuned model resource path | projects/{project}/locations/{location}/models/{model_id}
|
Router Integration
Fine-tuned models integrate seamlessly with LiteLLM's router system for load balancing and fallback:
FUNCTION configure_router_with_fine_tuned_models():
model_list = [
{
"model_name": "my-fine-tuned-model",
"litellm_params": {
"model": "ft:gpt-3.5-turbo:my-org:custom:abc123",
"api_key": "...",
}
},
{
"model_name": "my-fine-tuned-model",
"litellm_params": {
"model": "azure/my-ft-deployment",
"api_base": "...",
"api_key": "...",
}
},
]
router = Router(model_list=model_list)
response = router.completion(
model="my-fine-tuned-model",
messages=[...]
)
RETURN response
This allows fine-tuned models from different providers to be grouped under a single logical model name with automatic failover.