Implementation:Run llama Llama index OpenAIFinetuneEngine Finetune
Overview
The OpenAIFinetuneEngine class manages the complete lifecycle of an OpenAI finetuning job, from data validation and upload through job creation. The finetune() method orchestrates the multi-step launch sequence, while the engine's constructor and classmethod provide flexible initialization options.
Source File
- File:
llama-index-finetuning/llama_index/finetuning/openai/base.py - Lines: 21-92
- Import:
from llama_index.finetuning import OpenAIFinetuneEngine
Class Definition
class OpenAIFinetuneEngine(BaseLLMFinetuneEngine):
"""OpenAI Finetuning Engine."""
Inherits from BaseLLMFinetuneEngine (defined in llama_index.finetuning.types), which defines the abstract interface for all LLM finetuning engines with finetune() and get_finetuned_model() methods.
Constructor
def __init__(
self,
base_model: str,
data_path: str,
verbose: bool = False,
start_job_id: Optional[str] = None,
validate_json: bool = True,
) -> None:
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
base_model |
str |
required | The OpenAI model ID to finetune (e.g., "gpt-3.5-turbo")
|
data_path |
str |
required | Path to the JSONL training data file |
verbose |
bool |
False |
Whether to print status messages to stdout |
start_job_id |
Optional[str] |
None |
ID of an existing finetuning job to resume monitoring |
validate_json |
bool |
True |
Whether to validate the JSONL file before uploading |
Internal State:
self.base_model: The base model stringself.data_path: Path to training dataself._verbose: Verbosity flagself._validate_json: Validation flagself._start_job: TheFineTuningJobobject (populated afterfinetune()or fromstart_job_id)self._client:SyncOpenAIclient instance, initialized withOPENAI_API_KEYfrom environment
If start_job_id is provided, the constructor immediately retrieves the existing job via client.fine_tuning.jobs.retrieve(start_job_id).
from_finetuning_handler (classmethod)
@classmethod
def from_finetuning_handler(
cls,
finetuning_handler: OpenAIFineTuningHandler,
base_model: str,
data_path: str,
**kwargs: Any,
) -> "OpenAIFinetuneEngine":
Parameters:
| Parameter | Type | Description |
|---|---|---|
finetuning_handler |
OpenAIFineTuningHandler |
The callback handler containing collected training events |
base_model |
str |
The OpenAI model ID to finetune |
data_path |
str |
Path where the handler should save the JSONL file |
**kwargs |
Any |
Additional keyword arguments passed to the constructor |
Behavior:
- Calls
finetuning_handler.save_finetuning_events(data_path)to persist collected events to disk - Constructs and returns a new
OpenAIFinetuneEngineinstance with the saved data path
finetune Method
def finetune(self) -> None:
Parameters: None
Returns: None
Behavior (sequential steps):
- Validation (optional): If
self._validate_jsonisTrue, callsvalidate_json(self.data_path)to check the training data format, count tokens, and estimate costs - File upload: Opens the training data file in binary mode and uploads it via
client.files.create(file=f, purpose="fine-tune") - Job creation with retry: Enters a retry loop that calls
client.fine_tuning.jobs.create(training_file=output.id, model=self.base_model). If aBadRequestErroroccurs (file not yet processed), waits 60 seconds and retries - State update: Stores the returned
FineTuningJobobject inself._start_job - Logging: Logs the job ID and notification message via both the logger and stdout (if verbose)
Usage Example
from llama_index.finetuning import OpenAIFinetuneEngine
# Direct construction with a pre-existing JSONL file
engine = OpenAIFinetuneEngine(
base_model="gpt-3.5-turbo",
data_path="training_data.jsonl",
verbose=True,
)
engine.finetune()
# Output:
# File uploaded...
# Training job file-abc123 launched. You will be emailed when it's complete.
# Or construct from a finetuning handler
engine = OpenAIFinetuneEngine.from_finetuning_handler(
finetuning_handler=handler,
base_model="gpt-3.5-turbo",
data_path="training_data.jsonl",
verbose=True,
)
engine.finetune()
# Resume monitoring an existing job
engine = OpenAIFinetuneEngine(
base_model="gpt-3.5-turbo",
data_path="training_data.jsonl",
start_job_id="ftjob-abc123",
)
job = engine.get_current_job()
print(job.status)