Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Aligner Train GPT KTO

From Leeroopedia


Knowledge Sources
Domains NLP, Alignment
Last Updated 2026-02-08 00:00 GMT

Overview

train_gpt_kto.py is the entry point script for launching KTO (Kahneman-Tversky Optimization) training of a GPT model using NeMo Aligner.

Description

This script wires together all components required for KTO training:

  1. Configuration loading: Uses Hydra (@hydra_runner) with config path conf and config name gpt_kto. Loads and overrides the model config from the pretrained checkpoint.
  2. Trainer and experiment setup: Creates a PyTorch Lightning trainer via resolve_and_create_trainer(cfg, "kto") and initializes experiment management.
  3. Model loading: Loads a pretrained MegatronGPTKTOModel from a NeMo checkpoint with load_base_model_only=False. Initializes PEFT adapters via init_peft(). For full fine-tuning (when peft_scheme == "none"), the reference policy state dict is initialized from the current model weights.
  4. Data preparation: Builds KTO train/validation/test datasets using build_train_valid_test_kto_datasets(). Uses kto_custom_collate (with functools.partial) as the collate function, which constructs both the original samples and the KL estimation samples from mismatched prompt-response pairs.
  5. Optimizer and scheduler: Extracts the optimizer and scheduler from the PTL model.
  6. KTOTrainer instantiation: Creates the KTOTrainer with all dependencies, optionally restores trainer state from a checkpoint, and calls kto_trainer.fit().

The script registers custom OmegaConf resolvers for multiply and int_div operations.

Usage

Run this script via the command line with Hydra configuration overrides to launch KTO training. It requires a pretrained NeMo GPT checkpoint and KTO-format data with binary preference labels.

Code Reference

Source Location

Signature

@hydra_runner(config_path="conf", config_name="gpt_kto")
def main(cfg) -> None:

Import

from nemo_aligner.algorithms.kto import KTOTrainer, kto_custom_collate
from nemo_aligner.models.nlp.gpt.megatron_gpt_kto_model import MegatronGPTKTOModel

I/O Contract

Inputs

Name Type Required Description
cfg DictConfig Yes Hydra configuration object containing pretrained_checkpoint.restore_from_path, model, trainer, exp_manager, and data configuration
cfg.pretrained_checkpoint.restore_from_path str Yes Path to the pretrained NeMo GPT model checkpoint
cfg.model.peft.peft_scheme str Yes PEFT scheme to use; set to "none" for full fine-tuning
cfg.model.kto.ref_policy_kl_penalty float Yes Beta parameter controlling the strength of the KL penalty
cfg.model.kto.desirable_loss_weight float No Weight for desirable (chosen) samples in the loss (default: 1.0)
cfg.model.kto.undesirable_loss_weight float No Weight for undesirable (rejected) samples in the loss (default: 1.0)
cfg.model.data.data_prefix str Yes Path prefix for the KTO training data
cfg.model.data.splits_string str Yes Train/val/test split specification

Outputs

Name Type Description
None (side effects) N/A Trains the model in-place, saves checkpoints, and logs metrics. No return value.

Usage Examples

# Command-line invocation:
# python examples/nlp/gpt/train_gpt_kto.py \
#     pretrained_checkpoint.restore_from_path=/path/to/model.nemo \
#     model.kto.ref_policy_kl_penalty=0.1 \
#     model.kto.desirable_loss_weight=1.0 \
#     model.kto.undesirable_loss_weight=1.0 \
#     model.peft.peft_scheme=none \
#     model.data.data_prefix=/path/to/kto_data

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment