Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index OpenAIFinetuneEngine Finetune

From Leeroopedia
Revision as of 11:48, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Run_llama_Llama_index_OpenAIFinetuneEngine_Finetune.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

The OpenAIFinetuneEngine class manages the complete lifecycle of an OpenAI finetuning job, from data validation and upload through job creation. The finetune() method orchestrates the multi-step launch sequence, while the engine's constructor and classmethod provide flexible initialization options.

Source File

  • File: llama-index-finetuning/llama_index/finetuning/openai/base.py
  • Lines: 21-92
  • Import: from llama_index.finetuning import OpenAIFinetuneEngine

Class Definition

class OpenAIFinetuneEngine(BaseLLMFinetuneEngine):
    """OpenAI Finetuning Engine."""

Inherits from BaseLLMFinetuneEngine (defined in llama_index.finetuning.types), which defines the abstract interface for all LLM finetuning engines with finetune() and get_finetuned_model() methods.

Constructor

def __init__(
    self,
    base_model: str,
    data_path: str,
    verbose: bool = False,
    start_job_id: Optional[str] = None,
    validate_json: bool = True,
) -> None:

Parameters:

Parameter Type Default Description
base_model str required The OpenAI model ID to finetune (e.g., "gpt-3.5-turbo")
data_path str required Path to the JSONL training data file
verbose bool False Whether to print status messages to stdout
start_job_id Optional[str] None ID of an existing finetuning job to resume monitoring
validate_json bool True Whether to validate the JSONL file before uploading

Internal State:

  • self.base_model: The base model string
  • self.data_path: Path to training data
  • self._verbose: Verbosity flag
  • self._validate_json: Validation flag
  • self._start_job: The FineTuningJob object (populated after finetune() or from start_job_id)
  • self._client: SyncOpenAI client instance, initialized with OPENAI_API_KEY from environment

If start_job_id is provided, the constructor immediately retrieves the existing job via client.fine_tuning.jobs.retrieve(start_job_id).

from_finetuning_handler (classmethod)

@classmethod
def from_finetuning_handler(
    cls,
    finetuning_handler: OpenAIFineTuningHandler,
    base_model: str,
    data_path: str,
    **kwargs: Any,
) -> "OpenAIFinetuneEngine":

Parameters:

Parameter Type Description
finetuning_handler OpenAIFineTuningHandler The callback handler containing collected training events
base_model str The OpenAI model ID to finetune
data_path str Path where the handler should save the JSONL file
**kwargs Any Additional keyword arguments passed to the constructor

Behavior:

  • Calls finetuning_handler.save_finetuning_events(data_path) to persist collected events to disk
  • Constructs and returns a new OpenAIFinetuneEngine instance with the saved data path

finetune Method

def finetune(self) -> None:

Parameters: None

Returns: None

Behavior (sequential steps):

  1. Validation (optional): If self._validate_json is True, calls validate_json(self.data_path) to check the training data format, count tokens, and estimate costs
  2. File upload: Opens the training data file in binary mode and uploads it via client.files.create(file=f, purpose="fine-tune")
  3. Job creation with retry: Enters a retry loop that calls client.fine_tuning.jobs.create(training_file=output.id, model=self.base_model). If a BadRequestError occurs (file not yet processed), waits 60 seconds and retries
  4. State update: Stores the returned FineTuningJob object in self._start_job
  5. Logging: Logs the job ID and notification message via both the logger and stdout (if verbose)

Usage Example

from llama_index.finetuning import OpenAIFinetuneEngine

# Direct construction with a pre-existing JSONL file
engine = OpenAIFinetuneEngine(
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    verbose=True,
)
engine.finetune()
# Output:
#   File uploaded...
#   Training job file-abc123 launched. You will be emailed when it's complete.

# Or construct from a finetuning handler
engine = OpenAIFinetuneEngine.from_finetuning_handler(
    finetuning_handler=handler,
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    verbose=True,
)
engine.finetune()

# Resume monitoring an existing job
engine = OpenAIFinetuneEngine(
    base_model="gpt-3.5-turbo",
    data_path="training_data.jsonl",
    start_job_id="ftjob-abc123",
)
job = engine.get_current_job()
print(job.status)

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment