Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:NVIDIA NeMo Aligner Supervised Fine Tuning

From Leeroopedia
Revision as of 11:03, 16 February 2026 by Admin (talk | contribs) (Auto-imported from workflows/NVIDIA_NeMo_Aligner_Supervised_Fine_Tuning.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains LLMs, Fine_Tuning, Model_Alignment
Last Updated 2026-02-07 22:00 GMT

Overview

End-to-end process for supervised fine-tuning (SFT) of pretrained GPT models on instruction-following or chat datasets using NeMo-Aligner's distributed training framework.

Description

This workflow covers the standard supervised fine-tuning procedure that adapts a pretrained language model to follow instructions or engage in multi-turn conversations. It accepts either prompt-response or chat-format datasets and produces an aligned model checkpoint. SFT is typically the first alignment step, serving as a prerequisite for more advanced techniques such as RLHF (PPO), DPO, or REINFORCE. The training leverages Megatron-LM parallelism (tensor, pipeline, and data parallelism) through the NeMo Framework, enabling efficient scaling from single-GPU setups to multi-node clusters.

Key outputs:

  • A fine-tuned .nemo model checkpoint (megatron_gpt_sft.nemo)
  • Training metrics logged via WandB or TensorBoard

Scope:

  • From a pretrained .nemo checkpoint and a formatted JSONL dataset to a fine-tuned model ready for inference or further alignment

Usage

Execute this workflow when you have a pretrained GPT-based model (e.g., LLaMA, Nemotron, GPT-2B) and a supervised dataset of instruction-response pairs or multi-turn conversations, and you need to adapt the model to follow instructions or chat naturally. This is the recommended first step before any reinforcement learning-based alignment (PPO, REINFORCE, DPO).

Execution Steps

Step 1: Obtain pretrained model

Acquire a pretrained GPT model in NeMo format. This involves either downloading a pre-existing .nemo checkpoint (e.g., GPT-2B, LLaMA-3-8B, Nemotron-340B) or converting a Hugging Face model to .nemo format using NeMo's checkpoint conversion scripts. If starting from an older NeMo checkpoint, convert it to Megatron Core format.

Key considerations:

  • The model must be in Megatron Core format (mcore_gpt=True in config)
  • Ensure model.encoder_seq_length in configs matches your model's sequence length
  • Conversion scripts are available in the NeMo repository for common model families

Step 2: Prepare training data

Format the training data into JSONL files with the structure expected by NeMo-Aligner. Two formats are supported: prompt-response format (with input and output fields) and chat format (multi-turn conversation with role annotations). Optionally, apply sequence packing to concatenate multiple short examples into longer sequences for improved GPU utilization.

Key considerations:

  • Prompt-response format requires input and output JSON fields per line
  • Chat format requires multi-turn conversation structure with system/user/assistant roles
  • Sequence packing eliminates padding waste but requires a preprocessing step
  • Create separate train and validation JSONL files

Step 3: Configure training parameters

Set up the Hydra configuration for SFT training, including model parallelism settings (tensor parallel, pipeline parallel), batch sizes (micro and global), learning rate, precision (bf16 recommended), and data paths. The configuration is controlled via YAML files and command-line overrides.

Key considerations:

  • Use answer_only_loss=True to compute loss only on response tokens
  • Set model.data.chat=True when using chat-format datasets
  • Adjust parallelism settings based on model size and available GPU memory
  • Enable PEFT/LoRA if full-parameter tuning exceeds memory limits

Step 4: Launch SFT training

Execute the training script which orchestrates the full supervised training loop: loading the pretrained model, initializing distributed training, building train and validation dataloaders, extracting the optimizer and scheduler, and running the SupervisedTrainer's fit loop. The trainer handles gradient accumulation, checkpointing, validation, and logging.

What happens:

  • The pretrained model is loaded and wrapped as a GPTSFTModel
  • Distributed training is initialized across all GPUs/nodes
  • The SupervisedTrainer runs forward/backward passes with gradient synchronization
  • Checkpoints are saved at configured intervals based on validation loss
  • Training continues until max_steps or max_epochs is reached

Step 5: Export and validate

After training completes, a megatron_gpt_sft.nemo checkpoint is saved. This checkpoint can be used directly for inference or as the starting point for further alignment (reward model training, RLHF, DPO). Verify the model by checking validation loss convergence and optionally running inference with the correct prompt template.

Key considerations:

  • The saved model includes the prompt template used during training
  • Any downstream usage must follow the same prompt template format
  • Extract the prompt template from model_config.yaml inside the .nemo archive

Execution Diagram

GitHub URL

Workflow Repository