Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:NVIDIA NeMo Aligner Hydra Training Configuration

From Leeroopedia


Principle: Hydra Training Configuration
Type Principle
Project NVIDIA NeMo Aligner
Domains Configuration_Management, MLOps
Related Implementations Implementation:NVIDIA_NeMo_Aligner_Hydra_Config_Loading
Last Updated 2026-02-07 00:00 GMT

Overview

Pattern for declaratively specifying all training hyperparameters, model architecture settings, and data paths through hierarchical YAML configuration files.

Description

NeMo Aligner uses Hydra and OmegaConf for configuration management. Every training script is decorated with @hydra_runner, which loads a YAML config file and allows CLI overrides. The configuration hierarchy covers:

  • Trainer settings -- number of GPUs, precision mode, max epochs/steps
  • Model architecture -- tensor/pipeline parallelism degrees, micro/global batch sizes, hidden dimensions
  • Optimizer -- learning rate, weight decay, scheduler type and warmup
  • Data -- file paths, sequence length, data formats, number of workers
  • Algorithm-specific parameters -- KL penalty coefficient, loss type, reward scaling

This pattern decouples hyperparameter specification from code, enabling reproducible experiments and easy parameter sweeps without modifying source files.

Usage

Use this pattern in every training script. The workflow is:

  • Define sensible defaults in a YAML configuration file
  • Override specific values via CLI arguments for individual experiments
  • Use Hydra interpolation (${...}) for derived values that depend on other config entries

This is critical for managing the complexity of distributed training configurations, where tensor parallelism (TP), pipeline parallelism (PP), and data parallelism (DP) sizes must be coordinated along with mixed precision settings and gradient accumulation.

Theoretical Basis

The principle is grounded in hierarchical configuration management. Hydra resolves config groups, interpolations, and CLI overrides into a single unified DictConfig object.

The resolution pattern follows this order:

1. YAML file defines default values for all parameters
2. @hydra_runner decorator loads and parses the YAML
3. CLI overrides are merged on top of YAML defaults
4. OmegaConf resolves interpolations (e.g., ${model.hidden_size})
5. The final resolved DictConfig drives all training parameters

A typical configuration structure:

trainer:
  devices: 8
  precision: bf16
  max_steps: 1000

model:
  micro_batch_size: 4
  global_batch_size: 32
  tensor_model_parallel_size: 2
  pipeline_model_parallel_size: 1
  encoder_seq_length: 4096

  data:
    data_path: /data/sft_train.jsonl
    seq_length: ${model.encoder_seq_length}

  optim:
    name: fused_adam
    lr: 1e-5
    weight_decay: 0.01

exp_manager:
  checkpoint_callback_params:
    save_top_k: 3

Practical Guide

To use Hydra configuration in a training workflow:

  • Define a YAML file with sections for trainer, exp_manager, and model (including nested data, optim, and algorithm-specific parameters)
  • Use Hydra interpolation (${...}) for values derived from other config entries to avoid duplication
  • Override at the CLI for experiment variations:
python train_sft.py \
  model.optim.lr=1e-5 \
  trainer.max_steps=1000 \
  model.data.data_path=/new/data/path.jsonl
  • Leverage config groups to swap entire subsections (e.g., different optimizer configs) without rewriting the full YAML

Related Pages

Knowledge Sources

Configuration_Management | MLOps

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment