Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Aligner Train GPT RS Actor

From Leeroopedia


Knowledge Sources
Domains NLP, Alignment
Last Updated 2026-02-08 00:00 GMT

Overview

train_gpt_rs_actor.py is the entry point script for launching Rejection Sampling (RS) training of a GPT model using NeMo Aligner.

Description

This script wires together all components required for RS training:

  1. Configuration loading: Uses Hydra (@hydra_runner) with config path conf and config name gpt_rs_actor. Loads and overrides the model config from the pretrained checkpoint.
  2. Trainer and experiment setup: Creates a PyTorch Lightning trainer via resolve_and_create_trainer(cfg, "rs") and initializes experiment management.
  3. Model loading: Loads a pretrained MegatronGPTRSModel from a NeMo checkpoint, then optionally initializes PEFT adapters via init_peft().
  4. Data preparation: Builds RLHF train/validation datasets and dataloaders using build_train_valid_test_rlhf_datasets(). The collate function pads sequences to the maximum generation length.
  5. Optimizer and scheduler: Extracts the optimizer and scheduler from the PTL model. A dummy dataloader is used to configure NeMo's internal max-steps calculation.
  6. Reward model client: Instantiates RemoteGPTRMClient to communicate with an external reward model service.
  7. RSTrainer instantiation: Creates the RSTrainer with all dependencies, optionally restores trainer state from a checkpoint, and calls rs_trainer.fit().

Usage

Run this script via the command line with Hydra configuration overrides to launch RS training. It requires a pretrained NeMo GPT checkpoint and a running remote reward model service.

Code Reference

Source Location

Signature

@hydra_runner(config_path="conf", config_name="gpt_rs_actor")
def main(cfg) -> None:

Import

from nemo_aligner.algorithms.rs import RSTrainer
from nemo_aligner.models.nlp.gpt.megatron_gpt_rs_actor import MegatronGPTRSModel
from nemo_aligner.models.nlp.gpt.reward_critic_clients import RemoteGPTRMClient

I/O Contract

Inputs

Name Type Required Description
cfg DictConfig Yes Hydra configuration object containing pretrained_checkpoint.restore_from_path, model, trainer, exp_manager, remote_rm, and data configuration
cfg.pretrained_checkpoint.restore_from_path str Yes Path to the pretrained NeMo GPT model checkpoint
cfg.remote_rm DictConfig Yes Configuration for the remote reward model client (host, port, etc.)
cfg.model.rs.num_rollouts_per_prompt int Yes Number of candidate responses to generate per prompt
cfg.model.rs.top_n_rollouts int Yes Number of top-scoring responses to keep for training
cfg.model.rs.rollout_micro_batch_size int Yes Micro batch size for rollout generation
cfg.model.rs.num_rollout_samples int Yes Global batch size for rollout generation

Outputs

Name Type Description
None (side effects) N/A Trains the model in-place, saves checkpoints, and logs metrics. No return value.

Usage Examples

# Command-line invocation:
# python examples/nlp/gpt/train_gpt_rs_actor.py \
#     pretrained_checkpoint.restore_from_path=/path/to/model.nemo \
#     model.rs.num_rollouts_per_prompt=4 \
#     model.rs.top_n_rollouts=1 \
#     remote_rm.host=localhost \
#     remote_rm.port=5555

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment