Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:BerriAI Litellm Fine Tuning Job

From Leeroopedia
Knowledge Sources
Domains LLM_Ops, Fine_Tuning, Model_Training
Last Updated 2026-02-15 16:00 GMT

Overview

End-to-end process for creating and managing LLM fine-tuning jobs across providers through LiteLLM's unified fine-tuning API.

Description

This workflow covers the use of LiteLLM's unified fine-tuning API to create, monitor, and manage fine-tuning jobs across multiple LLM providers (OpenAI, Azure OpenAI, Vertex AI). The API follows OpenAI's fine-tuning interface while routing to the appropriate provider handler. It supports uploading training data, configuring hyperparameters, tracking job progress, and using the resulting fine-tuned model for inference.

Key outputs:

  • A fine-tuned model registered with the provider, ready for inference
  • Job status tracking with event logs
  • Support for custom hyperparameters (learning rate, epochs, batch size)
  • Cross-provider compatibility for fine-tuning operations

Usage

Execute this workflow when you have domain-specific training data and need to fine-tune a base LLM to improve performance on your particular task. This is appropriate when prompt engineering alone is insufficient and you need the model to learn from examples in your dataset.

Execution Steps

Step 1: Training Data Preparation

Prepare training data in JSONL format following the provider's expected schema. For chat models, each line contains a messages array with role/content pairs representing a complete conversation example. Optionally prepare a separate validation dataset for evaluating training progress.

Key considerations:

  • OpenAI expects JSONL with messages arrays for chat fine-tuning
  • Each example should represent a complete, high-quality interaction
  • Validation data helps monitor overfitting during training
  • Data quality directly impacts fine-tuning effectiveness

Step 2: Training File Upload

Upload the training data file using litellm.create_file() with purpose="fine-tune". This routes to the appropriate provider's file upload API and returns a file object with an ID. The file ID is used in the subsequent fine-tuning job creation.

Key considerations:

  • The file upload API supports OpenAI, Azure, Bedrock, and Vertex AI providers
  • File IDs are provider-specific and must be used with the same provider
  • Large files may take time to process after upload
  • Validation files are uploaded separately with the same purpose

Step 3: Fine_Tuning Job Creation

Create a fine-tuning job using litellm.create_fine_tuning_job() with the base model, training file ID, and optional hyperparameters. The function routes to the correct provider based on the custom_llm_provider parameter and returns a job object with a unique job ID and initial status.

Key considerations:

  • model specifies the base model to fine-tune (e.g., gpt-4o-mini-2024-07-18)
  • training_file is the file ID from the upload step
  • hyperparameters can include n_epochs, learning_rate_multiplier, batch_size
  • The suffix parameter adds a custom identifier to the fine-tuned model name
  • Job creation is asynchronous; training happens in the background

Step 4: Job Progress Monitoring

Monitor the fine-tuning job progress by polling job events with litellm.list_fine_tuning_events() and checking job status. Events include training loss metrics, validation results, and completion notifications. The job transitions through states: validating_files, queued, running, succeeded or failed.

Key considerations:

  • Events provide training loss and validation metrics at each step
  • Job status can be checked via retrieve_fine_tuning_job()
  • Training duration depends on dataset size, model size, and number of epochs
  • Failed jobs include error messages describing the failure reason

Step 5: Fine_Tuned Model Usage

Once the job succeeds, the provider returns a fine-tuned model identifier that can be used in regular completion() calls. The fine-tuned model retains the base model's capabilities while incorporating learned patterns from the training data.

Key considerations:

  • The fine-tuned model ID is available in the completed job object
  • Use the model ID with the same provider prefix for inference
  • Fine-tuned models have the same context window and capabilities as the base model
  • Pricing for fine-tuned models typically differs from the base model

Execution Diagram

GitHub URL

Workflow Repository