Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index OpenAIFinetuneEngine Finetune

From Leeroopedia

Overview

The OpenAIFinetuneEngine class manages the complete lifecycle of an OpenAI finetuning job, from data validation and upload through job creation. The finetune() method orchestrates the multi-step launch sequence, while the engine's constructor and classmethod provide flexible initialization options.

Source File

  • File: llama-index-finetuning/llama_index/finetuning/openai/base.py
  • Lines: 21-92
  • Import: from llama_index.finetuning import OpenAIFinetuneEngine

Class Definition

class OpenAIFinetuneEngine(BaseLLMFinetuneEngine):
    """OpenAI Finetuning Engine."""

Inherits from BaseLLMFinetuneEngine (defined in llama_index.finetuning.types), which defines the abstract interface for all LLM finetuning engines with finetune() and get_finetuned_model() methods.

Constructor

def __init__(
    self,
    base_model: str,
    data_path: str,
    verbose: bool = False,
    start_job_id: Optional[str] = None,
    validate_json: bool = True,
) -> None:

Parameters:

Parameter Type Default Description
base_model str required The OpenAI model ID to finetune (e.g., "gpt-3.5-turbo")
data_path str required Path to the JSONL training data file
verbose bool False Whether to print status messages to stdout
start_job_id Optional[str] None ID of an existing finetuning job to resume monitoring
validate_json bool True Whether to validate the JSONL file before uploading

Internal State:

  • self.base_model: The base model string
  • self.data_path: Path to training data
  • self._verbose: Verbosity flag
  • self._validate_json: Validation flag
  • self._start_job: The FineTuningJob object (populated after finetune() or from start_job_id)
  • self._client: SyncOpenAI client instance, initialized with OPENAI_API_KEY from environment

If start_job_id is provided, the constructor immediately retrieves the existing job via client.fine_tuning.jobs.retrieve(start_job_id).

from_finetuning_handler (classmethod)

@classmethod
def from_finetuning_handler(
    cls,
    finetuning_handler: OpenAIFineTuningHandler,
    base_model: str,
    data_path: str,
    **kwargs: Any,
) -> "OpenAIFinetuneEngine":

Parameters:

Parameter Type Description
finetuning_handler OpenAIFineTuningHandler The callback handler containing collected training events
base_model str The OpenAI model ID to finetune
data_path str Path where the handler should save the JSONL file
**kwargs Any Additional keyword arguments passed to the constructor

Behavior:

  • Calls finetuning_handler.save_finetuning_events(data_path) to persist collected events to disk
  • Constructs and returns a new OpenAIFinetuneEngine instance with the saved data path

finetune Method

def finetune(self) -> None:

Parameters: None

Returns: None

Behavior (sequential steps):

  1. Validation (optional): If self._validate_json is True, calls validate_json(self.data_path) to check the training data format, count tokens, and estimate costs
  2. File upload: Opens the training data file in binary mode and uploads it via client.files.create(file=f, purpose="fine-tune")
  3. Job creation with retry: Enters a retry loop that calls client.fine_tuning.jobs.create(training_file=output.id, model=self.base_model). If a BadRequestError occurs (file not yet processed), waits 60 seconds and retries
  4. State update: Stores the returned FineTuningJob object in self._start_job
  5. Logging: Logs the job ID and notification message via both the logger and stdout (if verbose)

Usage Example

from llama_index.finetuning import OpenAIFinetuneEngine

# Direct construction with a pre-existing JSONL file
engine = OpenAIFinetuneEngine(
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    verbose=True,
)
engine.finetune()
# Output:
#   File uploaded...
#   Training job file-abc123 launched. You will be emailed when it's complete.

# Or construct from a finetuning handler
engine = OpenAIFinetuneEngine.from_finetuning_handler(
    finetuning_handler=handler,
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    verbose=True,
)
engine.finetune()

# Resume monitoring an existing job
engine = OpenAIFinetuneEngine(
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    start_job_id="ftjob-abc123",
)
job = engine.get_current_job()
print(job.status)

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment