Principle:NVIDIA NeMo Aligner Hydra Training Configuration

Principle: Hydra Training Configuration
Type	Principle
Project	NVIDIA NeMo Aligner
Domains	Configuration_Management, MLOps
Related Implementations	Implementation:NVIDIA_NeMo_Aligner_Hydra_Config_Loading
Last Updated	2026-02-07 00:00 GMT

Overview

Pattern for declaratively specifying all training hyperparameters, model architecture settings, and data paths through hierarchical YAML configuration files.

Description

NeMo Aligner uses Hydra and OmegaConf for configuration management. Every training script is decorated with @hydra_runner, which loads a YAML config file and allows CLI overrides. The configuration hierarchy covers:

Trainer settings -- number of GPUs, precision mode, max epochs/steps
Model architecture -- tensor/pipeline parallelism degrees, micro/global batch sizes, hidden dimensions
Optimizer -- learning rate, weight decay, scheduler type and warmup
Data -- file paths, sequence length, data formats, number of workers
Algorithm-specific parameters -- KL penalty coefficient, loss type, reward scaling

This pattern decouples hyperparameter specification from code, enabling reproducible experiments and easy parameter sweeps without modifying source files.

Usage

Use this pattern in every training script. The workflow is:

Define sensible defaults in a YAML configuration file
Override specific values via CLI arguments for individual experiments
Use Hydra interpolation (${...}) for derived values that depend on other config entries

This is critical for managing the complexity of distributed training configurations, where tensor parallelism (TP), pipeline parallelism (PP), and data parallelism (DP) sizes must be coordinated along with mixed precision settings and gradient accumulation.

Theoretical Basis

The principle is grounded in hierarchical configuration management. Hydra resolves config groups, interpolations, and CLI overrides into a single unified DictConfig object.

The resolution pattern follows this order:

1. YAML file defines default values for all parameters
2. @hydra_runner decorator loads and parses the YAML
3. CLI overrides are merged on top of YAML defaults
4. OmegaConf resolves interpolations (e.g., ${model.hidden_size})
5. The final resolved DictConfig drives all training parameters

A typical configuration structure:

trainer:
  devices: 8
  precision: bf16
  max_steps: 1000

model:
  micro_batch_size: 4
  global_batch_size: 32
  tensor_model_parallel_size: 2
  pipeline_model_parallel_size: 1
  encoder_seq_length: 4096

  data:
    data_path: /data/sft_train.jsonl
    seq_length: ${model.encoder_seq_length}

  optim:
    name: fused_adam
    lr: 1e-5
    weight_decay: 0.01

exp_manager:
  checkpoint_callback_params:
    save_top_k: 3

Practical Guide

To use Hydra configuration in a training workflow:

Define a YAML file with sections for trainer, exp_manager, and model (including nested data, optim, and algorithm-specific parameters)
Use Hydra interpolation (${...}) for values derived from other config entries to avoid duplication
Override at the CLI for experiment variations:

python train_sft.py \
  model.optim.lr=1e-5 \
  trainer.max_steps=1000 \
  model.data.data_path=/new/data/path.jsonl

Leverage config groups to swap entire subsections (e.g., different optimizer configs) without rewriting the full YAML

Related Pages

Implementation:NVIDIA_NeMo_Aligner_Hydra_Config_Loading

Knowledge Sources

Configuration_Management | MLOps

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment