Workflow:Run llama Llama index OpenAI LLM Finetuning

Knowledge Sources	LlamaIndex LlamaIndex Docs OpenAI Fine-tuning
Domains	LLMs, Fine_Tuning, LLM_Ops
Last Updated	2026-02-11 19:00 GMT

Overview

End-to-end process for fine-tuning an OpenAI LLM on custom training data and integrating the resulting model into a LlamaIndex application.

Description

This workflow provides a managed interface for fine-tuning OpenAI models (such as GPT-3.5 Turbo or GPT-4) on domain-specific training data. It handles JSONL data validation, file upload to the OpenAI API, job creation and monitoring, and retrieval of the fine-tuned model as a LlamaIndex LLM instance. The workflow also supports the FinetuningHandler callback, which can capture LLM interactions during normal usage and automatically format them as training examples.

Usage

Execute this workflow when you need to adapt an OpenAI model to follow specific response patterns, adhere to domain-specific formats, or improve performance on specialized tasks. This requires an OpenAI API key with fine-tuning access and a JSONL training dataset in the OpenAI chat completion format.

Execution Steps

Step 1: Prepare Training Data

Create a JSONL file with training examples in the OpenAI chat completion format. Each line contains a messages array with system, user, and assistant roles. Alternatively, use the FinetuningHandler callback to automatically capture LLM interactions during normal usage and export them as training data.

Key considerations:

Each JSONL line must have a messages array with role/content pairs
Minimum of 10 training examples required by OpenAI
The FinetuningHandler attaches to the callback manager to capture events
Captured events can be exported as JSONL with finetuning_handler.get_finetuning_events()

Step 2: Validate Training Data

Run the built-in JSON validator to check the training data format, token counts, and cost estimates before uploading. The validator reports formatting errors, missing keys, and provides statistics on token distribution.

Key considerations:

Set validate_json=True when creating the engine (default)
The validator checks message structure, role values, and content format
Token count warnings help estimate fine-tuning cost
Fix any validation errors before proceeding

Step 3: Launch Fine-tuning Job

Create the OpenAIFinetuneEngine with the base model name and training data path, then call finetune(). This uploads the training file to OpenAI and creates a fine-tuning job. The method returns immediately; training happens asynchronously on OpenAI servers.

Key considerations:

Supported base models include gpt-3.5-turbo and gpt-4 variants
The training file is uploaded via the OpenAI Files API
A start_job_id can be provided to resume monitoring an existing job
Additional hyperparameters can be passed via the OpenAI API

Step 4: Monitor Job Status

Poll the fine-tuning job status using get_current_job(). The job progresses through queued, running, and succeeded (or failed) states. Wait for the job to reach the succeeded state before proceeding.

Key considerations:

Job status values: queued, validating_files, running, succeeded, failed, cancelled
Training duration depends on dataset size and model
Job events and metrics are available via the OpenAI dashboard

Step 5: Retrieve Fine-tuned Model

Once the job succeeds, call get_finetuned_model() to create a LlamaIndex OpenAI LLM instance configured with the fine-tuned model ID. This model can be used as a drop-in replacement anywhere a standard LLM is used in LlamaIndex.

Key considerations:

The fine-tuned model ID is automatically extracted from the completed job
Additional model kwargs (temperature, max_tokens) can be passed
The returned LLM instance integrates with all LlamaIndex components

Step 6: Integrate into Application

Use the fine-tuned LLM in a LlamaIndex pipeline by passing it to Settings.llm, a query engine, or an agent. The fine-tuned model produces responses aligned with the training data patterns while retaining its base capabilities.

Key considerations:

Set Settings.llm = finetuned_llm for global usage
Or pass llm= directly to specific components for targeted usage
Evaluate the fine-tuned model against the base model to measure improvement

Execution Diagram

GitHub URL

Workflow Repository