Workflow:NVIDIA NeMo Aligner Supervised Fine Tuning

Knowledge Sources	NeMo-Aligner NeMo Aligner SFT Guide NeMo-Aligner Paper
Domains	LLMs, Fine_Tuning, Model_Alignment
Last Updated	2026-02-07 22:00 GMT

Overview

End-to-end process for supervised fine-tuning (SFT) of pretrained GPT models on instruction-following or chat datasets using NeMo-Aligner's distributed training framework.

Description

This workflow covers the standard supervised fine-tuning procedure that adapts a pretrained language model to follow instructions or engage in multi-turn conversations. It accepts either prompt-response or chat-format datasets and produces an aligned model checkpoint. SFT is typically the first alignment step, serving as a prerequisite for more advanced techniques such as RLHF (PPO), DPO, or REINFORCE. The training leverages Megatron-LM parallelism (tensor, pipeline, and data parallelism) through the NeMo Framework, enabling efficient scaling from single-GPU setups to multi-node clusters.

Key outputs:

A fine-tuned .nemo model checkpoint (megatron_gpt_sft.nemo)
Training metrics logged via WandB or TensorBoard

Scope:

From a pretrained .nemo checkpoint and a formatted JSONL dataset to a fine-tuned model ready for inference or further alignment

Usage

Execute this workflow when you have a pretrained GPT-based model (e.g., LLaMA, Nemotron, GPT-2B) and a supervised dataset of instruction-response pairs or multi-turn conversations, and you need to adapt the model to follow instructions or chat naturally. This is the recommended first step before any reinforcement learning-based alignment (PPO, REINFORCE, DPO).

Execution Steps

Step 1: Obtain pretrained model

Acquire a pretrained GPT model in NeMo format. This involves either downloading a pre-existing .nemo checkpoint (e.g., GPT-2B, LLaMA-3-8B, Nemotron-340B) or converting a Hugging Face model to .nemo format using NeMo's checkpoint conversion scripts. If starting from an older NeMo checkpoint, convert it to Megatron Core format.

Key considerations:

The model must be in Megatron Core format (mcore_gpt=True in config)
Ensure model.encoder_seq_length in configs matches your model's sequence length
Conversion scripts are available in the NeMo repository for common model families

Step 2: Prepare training data

Format the training data into JSONL files with the structure expected by NeMo-Aligner. Two formats are supported: prompt-response format (with input and output fields) and chat format (multi-turn conversation with role annotations). Optionally, apply sequence packing to concatenate multiple short examples into longer sequences for improved GPU utilization.

Key considerations:

Prompt-response format requires input and output JSON fields per line
Chat format requires multi-turn conversation structure with system/user/assistant roles
Sequence packing eliminates padding waste but requires a preprocessing step
Create separate train and validation JSONL files

Step 3: Configure training parameters

Set up the Hydra configuration for SFT training, including model parallelism settings (tensor parallel, pipeline parallel), batch sizes (micro and global), learning rate, precision (bf16 recommended), and data paths. The configuration is controlled via YAML files and command-line overrides.

Key considerations:

Use answer_only_loss=True to compute loss only on response tokens
Set model.data.chat=True when using chat-format datasets
Adjust parallelism settings based on model size and available GPU memory
Enable PEFT/LoRA if full-parameter tuning exceeds memory limits

Step 4: Launch SFT training

Execute the training script which orchestrates the full supervised training loop: loading the pretrained model, initializing distributed training, building train and validation dataloaders, extracting the optimizer and scheduler, and running the SupervisedTrainer's fit loop. The trainer handles gradient accumulation, checkpointing, validation, and logging.

What happens:

The pretrained model is loaded and wrapped as a GPTSFTModel
Distributed training is initialized across all GPUs/nodes
The SupervisedTrainer runs forward/backward passes with gradient synchronization
Checkpoints are saved at configured intervals based on validation loss
Training continues until max_steps or max_epochs is reached

Step 5: Export and validate

After training completes, a megatron_gpt_sft.nemo checkpoint is saved. This checkpoint can be used directly for inference or as the starting point for further alignment (reward model training, RLHF, DPO). Verify the model by checking validation loss convergence and optionally running inference with the correct prompt template.

Key considerations:

The saved model includes the prompt template used during training
Any downstream usage must follow the same prompt template format
Extract the prompt template from model_config.yaml inside the .nemo archive

Execution Diagram

GitHub URL

Workflow Repository