Principle:NVIDIA NeMo Aligner Pretrained Model Loading

Principle: Pretrained Model Loading
Type	Principle
Project	NVIDIA NeMo Aligner
Domains	NLP, Transfer_Learning
Related Implementations	Implementation:NVIDIA_NeMo_Aligner_Load_From_NeMo
Last Updated	2026-02-07 00:00 GMT

Overview

Technique for restoring a pretrained language model from a serialized checkpoint and overriding its configuration for a new training task.

Description

In alignment training (SFT, DPO, RLHF), the first step is always loading a pretrained GPT model from a NeMo checkpoint (.nemo archive). This involves deserializing model weights, merging the checkpoint's original configuration with task-specific overrides (batch size, sequence length, optimizer settings), and instantiating the correct model class.

The principle ensures that pretrained knowledge is preserved while allowing architectural or hyperparameter changes for the downstream alignment objective. The loading process must handle:

Weight restoration -- deserializing parameters from the checkpoint file into the model's state dict
Configuration merging -- combining the original checkpoint config with user-specified overrides
Model class instantiation -- creating the appropriate model class (e.g., GPT, reward model) with the merged configuration
Parallelism setup -- respecting tensor and pipeline parallel settings during weight distribution

Usage

Use when initializing any alignment training workflow. Required as the first step in:

Supervised Fine-Tuning (SFT)
Reward model training
Direct Preference Optimization (DPO)
Proximal Policy Optimization (PPO)
REINFORCE-based alignment

The pretrained checkpoint provides the starting weights that will be fine-tuned for the target alignment objective.

Theoretical Basis

This principle is grounded in transfer learning from pretrained checkpoints. The model's learned representations are preserved while the configuration is adapted for the new task.

The key steps in the loading process are:

1. Load checkpoint config from .nemo archive
2. Merge checkpoint config with task-specific overrides using OmegaConf
   - Task overrides take precedence over checkpoint defaults
   - Architecture parameters (hidden size, num layers) remain locked
3. Instantiate the target model class with merged config
4. Restore weights from checkpoint into the instantiated model
5. Prepare model for training (set requires_grad, initialize optimizer)

The OmegaConf merge operation is critical: it allows the user to change hyperparameters such as learning rate, batch size, and sequence length without modifying the underlying architecture that holds the pretrained knowledge.

merged_config = OmegaConf.merge(checkpoint_config, task_overrides)
model = ModelClass.restore_from(checkpoint_path, override_config=merged_config)

Related Pages

Knowledge Sources

NLP | Transfer_Learning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment