Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Openai Openai node FineTuning Methods

From Leeroopedia
Knowledge Sources
Domains SDK, Fine-Tuning
Last Updated 2026-02-15 12:00 GMT

Overview

The Methods module defines TypeScript type interfaces for the three fine-tuning training methods supported by the OpenAI API: supervised fine-tuning, Direct Preference Optimization (DPO), and reinforcement learning.

Description

The Methods class extends APIResource but defines no HTTP methods of its own. Its primary purpose is to serve as a namespace for the training method type definitions that are used when creating fine-tuning jobs. The module exports six key interfaces organized into three method/hyperparameter pairs.

Supervised fine-tuning is configured via SupervisedMethod and SupervisedHyperparameters, which accept standard hyperparameters: batch_size, learning_rate_multiplier, and n_epochs. DPO (Direct Preference Optimization) adds a beta parameter controlling the penalty weight between the policy and reference model via DpoMethod and DpoHyperparameters. Reinforcement learning is the most feature-rich, adding compute_multiplier, eval_interval, eval_samples, and reasoning_effort parameters, and requiring a grader (which can be a StringCheckGrader, TextSimilarityGrader, PythonGrader, ScoreModelGrader, or MultiGrader).

All hyperparameter fields accept either a literal 'auto' string (letting the API choose) or a numeric value. This pattern provides both convenience for default behavior and precision for advanced users.

Usage

Use these types when constructing the method parameter for fine-tuning job creation. The specific method interface chosen determines which training algorithm and hyperparameters the API will use for the job.

Code Reference

Source Location

Signature

export interface SupervisedHyperparameters {
  batch_size?: 'auto' | number;
  learning_rate_multiplier?: 'auto' | number;
  n_epochs?: 'auto' | number;
}

export interface SupervisedMethod {
  hyperparameters?: SupervisedHyperparameters;
}

export interface DpoHyperparameters {
  batch_size?: 'auto' | number;
  beta?: 'auto' | number;
  learning_rate_multiplier?: 'auto' | number;
  n_epochs?: 'auto' | number;
}

export interface DpoMethod {
  hyperparameters?: DpoHyperparameters;
}

export interface ReinforcementHyperparameters {
  batch_size?: 'auto' | number;
  compute_multiplier?: 'auto' | number;
  eval_interval?: 'auto' | number;
  eval_samples?: 'auto' | number;
  learning_rate_multiplier?: 'auto' | number;
  n_epochs?: 'auto' | number;
  reasoning_effort?: 'default' | 'low' | 'medium' | 'high';
}

export interface ReinforcementMethod {
  grader: StringCheckGrader | TextSimilarityGrader | PythonGrader | ScoreModelGrader | MultiGrader;
  hyperparameters?: ReinforcementHyperparameters;
}

Import

import OpenAI from 'openai';

I/O Contract

Inputs

Name Type Required Description
batch_size number No Number of examples per batch. Larger sizes reduce update frequency but lower variance.
learning_rate_multiplier number No Scaling factor for learning rate. Smaller values help avoid overfitting.
n_epochs number No Number of full passes through the training dataset.
beta number No (DPO only) Weight of the penalty between policy and reference model.
compute_multiplier number No (Reinforcement only) Multiplier on compute used for search space exploration.
eval_interval number No (Reinforcement only) Training steps between evaluation runs.
eval_samples number No (Reinforcement only) Evaluation samples generated per training step.
reasoning_effort 'low' | 'medium' | 'high' No (Reinforcement only) Level of reasoning effort applied during training.
grader Grader union Yes (reinforcement) The grader used to evaluate reinforcement fine-tuning outputs.

Outputs

Name Type Description
SupervisedMethod interface Configuration object for supervised fine-tuning.
DpoMethod interface Configuration object for DPO fine-tuning.
ReinforcementMethod interface Configuration object for reinforcement fine-tuning with a required grader.

Usage Examples

import OpenAI from 'openai';

const client = new OpenAI();

// Create a supervised fine-tuning job
const supervisedJob = await client.fineTuning.jobs.create({
  training_file: 'file-abc123',
  model: 'gpt-4o-mini-2024-07-18',
  method: {
    type: 'supervised',
    supervised: {
      hyperparameters: {
        n_epochs: 3,
        batch_size: 'auto',
        learning_rate_multiplier: 0.1,
      },
    },
  },
});

// Create a reinforcement fine-tuning job with a grader
const rftJob = await client.fineTuning.jobs.create({
  training_file: 'file-def456',
  model: 'o4-mini',
  method: {
    type: 'reinforcement',
    reinforcement: {
      grader: {
        type: 'string_check',
        input: '{{item.output}}',
        name: 'match_check',
        reference: '{{item.reference}}',
        operation: 'eq',
      },
      hyperparameters: {
        reasoning_effort: 'medium',
      },
    },
  },
});

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment