Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA NeMo Aligner Hydra Config Loading

From Leeroopedia


Implementation Metadata
Name Hydra_Config_Loading
Type Pattern Doc
Implements Principle Hydra_Training_Configuration
Repository NeMo Aligner
Files examples/nlp/gpt/conf/gpt_sft.yaml, examples/nlp/gpt/conf/gpt_dpo.yaml
Lines Full config files
Domains Configuration_Management, MLOps
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete pattern for declaratively configuring NeMo Aligner training pipelines using Hydra YAML configuration files.

Description

Every NeMo Aligner training script uses the @hydra_runner decorator to load hierarchical YAML configuration. The configuration files define all training parameters: trainer settings (GPUs, precision, epochs), model architecture (TP/PP sizes, batch sizes), optimizer (learning rate, scheduler), data (paths, formats, sequence lengths), and algorithm-specific parameters. CLI overrides merge with YAML defaults to produce a resolved DictConfig.

Usage

Use this pattern in every training entry point. Define a YAML config file in examples/nlp/gpt/conf/ and decorate the main function with @hydra_runner. Override parameters at launch time via CLI.

Code Reference

Source Location

  • Repository: NeMo Aligner
  • File: examples/nlp/gpt/conf/gpt_sft.yaml (SFT config example)
  • File: examples/nlp/gpt/conf/gpt_dpo.yaml (DPO config example)
  • Lines: Full config files

Interface

from nemo.core.config import hydra_runner

@hydra_runner(config_path="conf", config_name="gpt_sft")
def main(cfg) -> None:
    # cfg is a resolved DictConfig with all parameters
    # Access via cfg.trainer.max_steps, cfg.model.optim.lr, etc.
    ...

Key Config Sections

trainer:
  num_nodes: 1
  devices: 8
  precision: bf16
  sft/dpo/ppo:
    max_epochs: 1
    max_steps: -1
    val_check_interval: 100
    save_interval: 100
    gradient_clip_val: 1.0

model:
  micro_batch_size: 1
  global_batch_size: 64
  restore_from_path: ???  # Required: path to pretrained .nemo
  data:
    data_prefix: /path/to/data.jsonl
    seq_length: 4096
  optim:
    name: fused_adam
    lr: 1e-5

I/O Contract

Inputs

Name Type Required Description
config_path str Yes Relative path to the directory containing YAML config files (e.g., "conf")
config_name str Yes Name of the YAML config file without extension (e.g., "gpt_sft")
CLI overrides str No Command-line key=value overrides that merge with YAML defaults

Outputs

Name Type Description
cfg DictConfig Fully resolved configuration object with all parameters accessible via dot notation

Usage Examples

Launching SFT Training with CLI Overrides

python examples/nlp/gpt/train_gpt_sft.py \
    model.restore_from_path=/models/gpt.nemo \
    model.data.train_ds.file_path=/data/train.jsonl \
    model.optim.lr=5e-6 \
    trainer.sft.max_steps=1000

Related Pages

Knowledge Sources

Configuration_Management | MLOps

2026-02-07 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment