Principle:NVIDIA NeMo Aligner Pretrained Model Loading
| Principle: Pretrained Model Loading | |
|---|---|
| Type | Principle |
| Project | NVIDIA NeMo Aligner |
| Domains | NLP, Transfer_Learning |
| Related Implementations | Implementation:NVIDIA_NeMo_Aligner_Load_From_NeMo |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Technique for restoring a pretrained language model from a serialized checkpoint and overriding its configuration for a new training task.
Description
In alignment training (SFT, DPO, RLHF), the first step is always loading a pretrained GPT model from a NeMo checkpoint (.nemo archive). This involves deserializing model weights, merging the checkpoint's original configuration with task-specific overrides (batch size, sequence length, optimizer settings), and instantiating the correct model class.
The principle ensures that pretrained knowledge is preserved while allowing architectural or hyperparameter changes for the downstream alignment objective. The loading process must handle:
- Weight restoration -- deserializing parameters from the checkpoint file into the model's state dict
- Configuration merging -- combining the original checkpoint config with user-specified overrides
- Model class instantiation -- creating the appropriate model class (e.g., GPT, reward model) with the merged configuration
- Parallelism setup -- respecting tensor and pipeline parallel settings during weight distribution
Usage
Use when initializing any alignment training workflow. Required as the first step in:
- Supervised Fine-Tuning (SFT)
- Reward model training
- Direct Preference Optimization (DPO)
- Proximal Policy Optimization (PPO)
- REINFORCE-based alignment
The pretrained checkpoint provides the starting weights that will be fine-tuned for the target alignment objective.
Theoretical Basis
This principle is grounded in transfer learning from pretrained checkpoints. The model's learned representations are preserved while the configuration is adapted for the new task.
The key steps in the loading process are:
1. Load checkpoint config from .nemo archive
2. Merge checkpoint config with task-specific overrides using OmegaConf
- Task overrides take precedence over checkpoint defaults
- Architecture parameters (hidden size, num layers) remain locked
3. Instantiate the target model class with merged config
4. Restore weights from checkpoint into the instantiated model
5. Prepare model for training (set requires_grad, initialize optimizer)
The OmegaConf merge operation is critical: it allows the user to change hyperparameters such as learning rate, batch size, and sequence length without modifying the underlying architecture that holds the pretrained knowledge.
merged_config = OmegaConf.merge(checkpoint_config, task_overrides)
model = ModelClass.restore_from(checkpoint_path, override_config=merged_config)
Related Pages
- Implementation:NVIDIA_NeMo_Aligner_Load_From_NeMo
- Heuristic:NVIDIA_NeMo_Aligner_Warning_Deprecated_Repository