Principle:NVIDIA NeMo Aligner Supervised Training Loop

Principle: Supervised Training Loop
Type	Principle
Project	NVIDIA NeMo Aligner
Domains	Deep_Learning, Training
Related Implementations	Implementation:NVIDIA_NeMo_Aligner_SupervisedTrainer_Fit
Last Updated	2026-02-07 00:00 GMT

Overview

Generic training loop pattern for supervised alignment objectives including SFT and reward model training.

Description

The supervised training loop orchestrates the iterative optimization of a model on labeled data. It handles:

Epoch iteration -- cycling through the training dataset for a configurable number of passes
Micro-batch gradient accumulation -- accumulating gradients across multiple micro-batches before performing an optimizer step
Validation -- evaluating the model at configurable intervals (steps or fractions of an epoch)
Checkpoint saving -- persisting model state at regular intervals for fault tolerance
Learning rate scheduling -- adjusting the learning rate according to a defined schedule (warmup, decay)
Distributed metric logging -- aggregating and logging metrics (loss, accuracy) across all parallel ranks

The loop is agnostic to the specific loss function -- it delegates to the model's get_loss_and_metrics interface. This makes it reusable for both SFT (cross-entropy on response tokens) and reward model training (Bradley-Terry ranking loss).

Usage

Use for any alignment objective that trains on static labeled datasets (not online/RL). This covers:

Supervised Fine-Tuning (SFT) -- cross-entropy loss on response tokens
Reward model training -- ranking loss on preference pairs
Knowledge distillation -- KL divergence from teacher outputs
SteerLM -- attribute-conditioned response training

The model must implement the SupervisedInterface, which requires:

get_loss_and_metrics(batch, forward_only) -- compute loss and return metrics
prepare_for_training_step() -- setup before each training step (e.g., enable gradients)
finish_training_step() -- cleanup after each training step (e.g., gradient clipping)

Theoretical Basis

The loop implements standard supervised learning with distributed training support. The core algorithm is:

for each epoch in num_epochs:
    for each batch in training_dataloader:
        # Forward pass
        loss, metrics = model.get_loss_and_metrics(batch, forward_only=False)

        # Backward pass
        compute gradients of loss with respect to model parameters

        # Gradient accumulation
        if accumulated micro_batches == global_batch / micro_batch:
            # Optimizer step
            optimizer.step()
            learning_rate_scheduler.step()
            optimizer.zero_grad()

        # Periodic validation
        if step_count % val_check_interval == 0:
            for each val_batch in validation_dataloader:
                val_loss, val_metrics = model.get_loss_and_metrics(val_batch, forward_only=True)
            log aggregated validation metrics

        # Periodic checkpoint saving
        if step_count % save_interval == 0:
            save checkpoint (model weights, optimizer state, trainer state)

Key properties of this loop:

Forward-only validation -- the forward_only=True flag disables gradient computation during validation for efficiency
Micro-batch accumulation -- enables training with large effective batch sizes that would not fit in GPU memory as a single batch
Loss agnosticism -- the loop never computes loss directly; it delegates entirely to the model, making it reusable across different objectives

Related Pages

Implementation:NVIDIA_NeMo_Aligner_SupervisedTrainer_Fit

Knowledge Sources

NeMo Aligner

Deep_Learning | Training

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment