Implementation:Run llama Llama index OpenAIFinetuneEngine Finetune

Overview

The OpenAIFinetuneEngine class manages the complete lifecycle of an OpenAI finetuning job, from data validation and upload through job creation. The finetune() method orchestrates the multi-step launch sequence, while the engine's constructor and classmethod provide flexible initialization options.

Source File

File: llama-index-finetuning/llama_index/finetuning/openai/base.py
Lines: 21-92
Import: from llama_index.finetuning import OpenAIFinetuneEngine

Class Definition

class OpenAIFinetuneEngine(BaseLLMFinetuneEngine):
    """OpenAI Finetuning Engine."""

Inherits from BaseLLMFinetuneEngine (defined in llama_index.finetuning.types), which defines the abstract interface for all LLM finetuning engines with finetune() and get_finetuned_model() methods.

Constructor

def __init__(
    self,
    base_model: str,
    data_path: str,
    verbose: bool = False,
    start_job_id: Optional[str] = None,
    validate_json: bool = True,
) -> None:

Parameters:

Parameter	Type	Default	Description
`base_model`	`str`	required	The OpenAI model ID to finetune (e.g., `"gpt-3.5-turbo"`)
`data_path`	`str`	required	Path to the JSONL training data file
`verbose`	`bool`	`False`	Whether to print status messages to stdout
`start_job_id`	`Optional[str]`	`None`	ID of an existing finetuning job to resume monitoring
`validate_json`	`bool`	`True`	Whether to validate the JSONL file before uploading

Internal State:

self.base_model: The base model string
self.data_path: Path to training data
self._verbose: Verbosity flag
self._validate_json: Validation flag
self._start_job: The FineTuningJob object (populated after finetune() or from start_job_id)
self._client: SyncOpenAI client instance, initialized with OPENAI_API_KEY from environment

If start_job_id is provided, the constructor immediately retrieves the existing job via client.fine_tuning.jobs.retrieve(start_job_id).

from_finetuning_handler (classmethod)

@classmethod
def from_finetuning_handler(
    cls,
    finetuning_handler: OpenAIFineTuningHandler,
    base_model: str,
    data_path: str,
    **kwargs: Any,
) -> "OpenAIFinetuneEngine":

Parameters:

Parameter	Type	Description
`finetuning_handler`	`OpenAIFineTuningHandler`	The callback handler containing collected training events
`base_model`	`str`	The OpenAI model ID to finetune
`data_path`	`str`	Path where the handler should save the JSONL file
`**kwargs`	`Any`	Additional keyword arguments passed to the constructor

Behavior:

Calls finetuning_handler.save_finetuning_events(data_path) to persist collected events to disk
Constructs and returns a new OpenAIFinetuneEngine instance with the saved data path

finetune Method

def finetune(self) -> None:

Parameters: None

Returns: None

Behavior (sequential steps):

Validation (optional): If self._validate_json is True, calls validate_json(self.data_path) to check the training data format, count tokens, and estimate costs
File upload: Opens the training data file in binary mode and uploads it via client.files.create(file=f, purpose="fine-tune")
Job creation with retry: Enters a retry loop that calls client.fine_tuning.jobs.create(training_file=output.id, model=self.base_model). If a BadRequestError occurs (file not yet processed), waits 60 seconds and retries
State update: Stores the returned FineTuningJob object in self._start_job
Logging: Logs the job ID and notification message via both the logger and stdout (if verbose)

Usage Example

from llama_index.finetuning import OpenAIFinetuneEngine

# Direct construction with a pre-existing JSONL file
engine = OpenAIFinetuneEngine(
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    verbose=True,
)
engine.finetune()
# Output:
#   File uploaded...
#   Training job file-abc123 launched. You will be emailed when it's complete.

# Or construct from a finetuning handler
engine = OpenAIFinetuneEngine.from_finetuning_handler(
    finetuning_handler=handler,
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    verbose=True,
)
engine.finetune()

# Resume monitoring an existing job
engine = OpenAIFinetuneEngine(
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    start_job_id="ftjob-abc123",
)
job = engine.get_current_job()
print(job.status)

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment