Principle:NVIDIA NeMo Aligner Supervised Training Loop
| Principle: Supervised Training Loop | |
|---|---|
| Type | Principle |
| Project | NVIDIA NeMo Aligner |
| Domains | Deep_Learning, Training |
| Related Implementations | Implementation:NVIDIA_NeMo_Aligner_SupervisedTrainer_Fit |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Generic training loop pattern for supervised alignment objectives including SFT and reward model training.
Description
The supervised training loop orchestrates the iterative optimization of a model on labeled data. It handles:
- Epoch iteration -- cycling through the training dataset for a configurable number of passes
- Micro-batch gradient accumulation -- accumulating gradients across multiple micro-batches before performing an optimizer step
- Validation -- evaluating the model at configurable intervals (steps or fractions of an epoch)
- Checkpoint saving -- persisting model state at regular intervals for fault tolerance
- Learning rate scheduling -- adjusting the learning rate according to a defined schedule (warmup, decay)
- Distributed metric logging -- aggregating and logging metrics (loss, accuracy) across all parallel ranks
The loop is agnostic to the specific loss function -- it delegates to the model's get_loss_and_metrics interface. This makes it reusable for both SFT (cross-entropy on response tokens) and reward model training (Bradley-Terry ranking loss).
Usage
Use for any alignment objective that trains on static labeled datasets (not online/RL). This covers:
- Supervised Fine-Tuning (SFT) -- cross-entropy loss on response tokens
- Reward model training -- ranking loss on preference pairs
- Knowledge distillation -- KL divergence from teacher outputs
- SteerLM -- attribute-conditioned response training
The model must implement the SupervisedInterface, which requires:
get_loss_and_metrics(batch, forward_only)-- compute loss and return metricsprepare_for_training_step()-- setup before each training step (e.g., enable gradients)finish_training_step()-- cleanup after each training step (e.g., gradient clipping)
Theoretical Basis
The loop implements standard supervised learning with distributed training support. The core algorithm is:
for each epoch in num_epochs:
for each batch in training_dataloader:
# Forward pass
loss, metrics = model.get_loss_and_metrics(batch, forward_only=False)
# Backward pass
compute gradients of loss with respect to model parameters
# Gradient accumulation
if accumulated micro_batches == global_batch / micro_batch:
# Optimizer step
optimizer.step()
learning_rate_scheduler.step()
optimizer.zero_grad()
# Periodic validation
if step_count % val_check_interval == 0:
for each val_batch in validation_dataloader:
val_loss, val_metrics = model.get_loss_and_metrics(val_batch, forward_only=True)
log aggregated validation metrics
# Periodic checkpoint saving
if step_count % save_interval == 0:
save checkpoint (model weights, optimizer state, trainer state)
Key properties of this loop:
- Forward-only validation -- the
forward_only=Trueflag disables gradient computation during validation for efficiency - Micro-batch accumulation -- enables training with large effective batch sizes that would not fit in GPU memory as a single batch
- Loss agnosticism -- the loop never computes loss directly; it delegates entirely to the model, making it reusable across different objectives