Implementation:Openai Openai node FineTuning Methods
| Knowledge Sources | |
|---|---|
| Domains | SDK, Fine-Tuning |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
The Methods module defines TypeScript type interfaces for the three fine-tuning training methods supported by the OpenAI API: supervised fine-tuning, Direct Preference Optimization (DPO), and reinforcement learning.
Description
The Methods class extends APIResource but defines no HTTP methods of its own. Its primary purpose is to serve as a namespace for the training method type definitions that are used when creating fine-tuning jobs. The module exports six key interfaces organized into three method/hyperparameter pairs.
Supervised fine-tuning is configured via SupervisedMethod and SupervisedHyperparameters, which accept standard hyperparameters: batch_size, learning_rate_multiplier, and n_epochs. DPO (Direct Preference Optimization) adds a beta parameter controlling the penalty weight between the policy and reference model via DpoMethod and DpoHyperparameters. Reinforcement learning is the most feature-rich, adding compute_multiplier, eval_interval, eval_samples, and reasoning_effort parameters, and requiring a grader (which can be a StringCheckGrader, TextSimilarityGrader, PythonGrader, ScoreModelGrader, or MultiGrader).
All hyperparameter fields accept either a literal 'auto' string (letting the API choose) or a numeric value. This pattern provides both convenience for default behavior and precision for advanced users.
Usage
Use these types when constructing the method parameter for fine-tuning job creation. The specific method interface chosen determines which training algorithm and hyperparameters the API will use for the job.
Code Reference
Source Location
- Repository: openai-node
- File: src/resources/fine-tuning/methods.ts
Signature
export interface SupervisedHyperparameters {
batch_size?: 'auto' | number;
learning_rate_multiplier?: 'auto' | number;
n_epochs?: 'auto' | number;
}
export interface SupervisedMethod {
hyperparameters?: SupervisedHyperparameters;
}
export interface DpoHyperparameters {
batch_size?: 'auto' | number;
beta?: 'auto' | number;
learning_rate_multiplier?: 'auto' | number;
n_epochs?: 'auto' | number;
}
export interface DpoMethod {
hyperparameters?: DpoHyperparameters;
}
export interface ReinforcementHyperparameters {
batch_size?: 'auto' | number;
compute_multiplier?: 'auto' | number;
eval_interval?: 'auto' | number;
eval_samples?: 'auto' | number;
learning_rate_multiplier?: 'auto' | number;
n_epochs?: 'auto' | number;
reasoning_effort?: 'default' | 'low' | 'medium' | 'high';
}
export interface ReinforcementMethod {
grader: StringCheckGrader | TextSimilarityGrader | PythonGrader | ScoreModelGrader | MultiGrader;
hyperparameters?: ReinforcementHyperparameters;
}
Import
import OpenAI from 'openai';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch_size | number | No | Number of examples per batch. Larger sizes reduce update frequency but lower variance. |
| learning_rate_multiplier | number | No | Scaling factor for learning rate. Smaller values help avoid overfitting. |
| n_epochs | number | No | Number of full passes through the training dataset. |
| beta | number | No | (DPO only) Weight of the penalty between policy and reference model. |
| compute_multiplier | number | No | (Reinforcement only) Multiplier on compute used for search space exploration. |
| eval_interval | number | No | (Reinforcement only) Training steps between evaluation runs. |
| eval_samples | number | No | (Reinforcement only) Evaluation samples generated per training step. |
| reasoning_effort | 'low' | 'medium' | 'high' | No | (Reinforcement only) Level of reasoning effort applied during training. |
| grader | Grader union |
Yes (reinforcement) | The grader used to evaluate reinforcement fine-tuning outputs. |
Outputs
| Name | Type | Description |
|---|---|---|
| SupervisedMethod | interface |
Configuration object for supervised fine-tuning. |
| DpoMethod | interface |
Configuration object for DPO fine-tuning. |
| ReinforcementMethod | interface |
Configuration object for reinforcement fine-tuning with a required grader. |
Usage Examples
import OpenAI from 'openai';
const client = new OpenAI();
// Create a supervised fine-tuning job
const supervisedJob = await client.fineTuning.jobs.create({
training_file: 'file-abc123',
model: 'gpt-4o-mini-2024-07-18',
method: {
type: 'supervised',
supervised: {
hyperparameters: {
n_epochs: 3,
batch_size: 'auto',
learning_rate_multiplier: 0.1,
},
},
},
});
// Create a reinforcement fine-tuning job with a grader
const rftJob = await client.fineTuning.jobs.create({
training_file: 'file-def456',
model: 'o4-mini',
method: {
type: 'reinforcement',
reinforcement: {
grader: {
type: 'string_check',
input: '{{item.output}}',
name: 'match_check',
reference: '{{item.reference}}',
operation: 'eq',
},
hyperparameters: {
reasoning_effort: 'medium',
},
},
},
});