Workflow:NVIDIA NeMo Aligner Supervised Fine Tuning
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Fine_Tuning, Model_Alignment |
| Last Updated | 2026-02-07 22:00 GMT |
Overview
End-to-end process for supervised fine-tuning (SFT) of pretrained GPT models on instruction-following or chat datasets using NeMo-Aligner's distributed training framework.
Description
This workflow covers the standard supervised fine-tuning procedure that adapts a pretrained language model to follow instructions or engage in multi-turn conversations. It accepts either prompt-response or chat-format datasets and produces an aligned model checkpoint. SFT is typically the first alignment step, serving as a prerequisite for more advanced techniques such as RLHF (PPO), DPO, or REINFORCE. The training leverages Megatron-LM parallelism (tensor, pipeline, and data parallelism) through the NeMo Framework, enabling efficient scaling from single-GPU setups to multi-node clusters.
Key outputs:
- A fine-tuned .nemo model checkpoint (megatron_gpt_sft.nemo)
- Training metrics logged via WandB or TensorBoard
Scope:
- From a pretrained .nemo checkpoint and a formatted JSONL dataset to a fine-tuned model ready for inference or further alignment
Usage
Execute this workflow when you have a pretrained GPT-based model (e.g., LLaMA, Nemotron, GPT-2B) and a supervised dataset of instruction-response pairs or multi-turn conversations, and you need to adapt the model to follow instructions or chat naturally. This is the recommended first step before any reinforcement learning-based alignment (PPO, REINFORCE, DPO).
Execution Steps
Step 1: Obtain pretrained model
Acquire a pretrained GPT model in NeMo format. This involves either downloading a pre-existing .nemo checkpoint (e.g., GPT-2B, LLaMA-3-8B, Nemotron-340B) or converting a Hugging Face model to .nemo format using NeMo's checkpoint conversion scripts. If starting from an older NeMo checkpoint, convert it to Megatron Core format.
Key considerations:
- The model must be in Megatron Core format (mcore_gpt=True in config)
- Ensure model.encoder_seq_length in configs matches your model's sequence length
- Conversion scripts are available in the NeMo repository for common model families
Step 2: Prepare training data
Format the training data into JSONL files with the structure expected by NeMo-Aligner. Two formats are supported: prompt-response format (with input and output fields) and chat format (multi-turn conversation with role annotations). Optionally, apply sequence packing to concatenate multiple short examples into longer sequences for improved GPU utilization.
Key considerations:
- Prompt-response format requires input and output JSON fields per line
- Chat format requires multi-turn conversation structure with system/user/assistant roles
- Sequence packing eliminates padding waste but requires a preprocessing step
- Create separate train and validation JSONL files
Step 3: Configure training parameters
Set up the Hydra configuration for SFT training, including model parallelism settings (tensor parallel, pipeline parallel), batch sizes (micro and global), learning rate, precision (bf16 recommended), and data paths. The configuration is controlled via YAML files and command-line overrides.
Key considerations:
- Use answer_only_loss=True to compute loss only on response tokens
- Set model.data.chat=True when using chat-format datasets
- Adjust parallelism settings based on model size and available GPU memory
- Enable PEFT/LoRA if full-parameter tuning exceeds memory limits
Step 4: Launch SFT training
Execute the training script which orchestrates the full supervised training loop: loading the pretrained model, initializing distributed training, building train and validation dataloaders, extracting the optimizer and scheduler, and running the SupervisedTrainer's fit loop. The trainer handles gradient accumulation, checkpointing, validation, and logging.
What happens:
- The pretrained model is loaded and wrapped as a GPTSFTModel
- Distributed training is initialized across all GPUs/nodes
- The SupervisedTrainer runs forward/backward passes with gradient synchronization
- Checkpoints are saved at configured intervals based on validation loss
- Training continues until max_steps or max_epochs is reached
Step 5: Export and validate
After training completes, a megatron_gpt_sft.nemo checkpoint is saved. This checkpoint can be used directly for inference or as the starting point for further alignment (reward model training, RLHF, DPO). Verify the model by checking validation loss convergence and optionally running inference with the correct prompt template.
Key considerations:
- The saved model includes the prompt template used during training
- Any downstream usage must follow the same prompt template format
- Extract the prompt template from model_config.yaml inside the .nemo archive