Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:NVIDIA NeMo Aligner Supervised Training Loop

From Leeroopedia


Principle: Supervised Training Loop
Type Principle
Project NVIDIA NeMo Aligner
Domains Deep_Learning, Training
Related Implementations Implementation:NVIDIA_NeMo_Aligner_SupervisedTrainer_Fit
Last Updated 2026-02-07 00:00 GMT

Overview

Generic training loop pattern for supervised alignment objectives including SFT and reward model training.

Description

The supervised training loop orchestrates the iterative optimization of a model on labeled data. It handles:

  • Epoch iteration -- cycling through the training dataset for a configurable number of passes
  • Micro-batch gradient accumulation -- accumulating gradients across multiple micro-batches before performing an optimizer step
  • Validation -- evaluating the model at configurable intervals (steps or fractions of an epoch)
  • Checkpoint saving -- persisting model state at regular intervals for fault tolerance
  • Learning rate scheduling -- adjusting the learning rate according to a defined schedule (warmup, decay)
  • Distributed metric logging -- aggregating and logging metrics (loss, accuracy) across all parallel ranks

The loop is agnostic to the specific loss function -- it delegates to the model's get_loss_and_metrics interface. This makes it reusable for both SFT (cross-entropy on response tokens) and reward model training (Bradley-Terry ranking loss).

Usage

Use for any alignment objective that trains on static labeled datasets (not online/RL). This covers:

  • Supervised Fine-Tuning (SFT) -- cross-entropy loss on response tokens
  • Reward model training -- ranking loss on preference pairs
  • Knowledge distillation -- KL divergence from teacher outputs
  • SteerLM -- attribute-conditioned response training

The model must implement the SupervisedInterface, which requires:

  • get_loss_and_metrics(batch, forward_only) -- compute loss and return metrics
  • prepare_for_training_step() -- setup before each training step (e.g., enable gradients)
  • finish_training_step() -- cleanup after each training step (e.g., gradient clipping)

Theoretical Basis

The loop implements standard supervised learning with distributed training support. The core algorithm is:

for each epoch in num_epochs:
    for each batch in training_dataloader:
        # Forward pass
        loss, metrics = model.get_loss_and_metrics(batch, forward_only=False)

        # Backward pass
        compute gradients of loss with respect to model parameters

        # Gradient accumulation
        if accumulated micro_batches == global_batch / micro_batch:
            # Optimizer step
            optimizer.step()
            learning_rate_scheduler.step()
            optimizer.zero_grad()

        # Periodic validation
        if step_count % val_check_interval == 0:
            for each val_batch in validation_dataloader:
                val_loss, val_metrics = model.get_loss_and_metrics(val_batch, forward_only=True)
            log aggregated validation metrics

        # Periodic checkpoint saving
        if step_count % save_interval == 0:
            save checkpoint (model weights, optimizer state, trainer state)

Key properties of this loop:

  • Forward-only validation -- the forward_only=True flag disables gradient computation during validation for efficiency
  • Micro-batch accumulation -- enables training with large effective batch sizes that would not fit in GPU memory as a single batch
  • Loss agnosticism -- the loop never computes loss directly; it delegates entirely to the model, making it reusable across different objectives

Related Pages

Knowledge Sources

Deep_Learning | Training

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment