Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Sail sg LongSpec Hydra Configuration

From Leeroopedia
Knowledge Sources
Domains Configuration_Management, Training
Last Updated 2026-02-14 05:00 GMT

Overview

Configuration management principle using Hydra's composable YAML system to define training experiments with hierarchical overrides and object instantiation.

Description

Hydra Configuration in LongSpec defines all training parameters—model architecture, data paths, optimizer settings, DeepSpeed config, and training hyperparameters—in composable YAML files. The key benefits are:

  • Composability: Experiment configs inherit from and override base configs. A single CLI argument (+exp=config_name) selects the full experiment setup.
  • Object instantiation: The _target_ key in YAML specifies Python class paths, allowing Hydra to construct model, dataset, collator, and aligner objects directly from config.
  • Stage chaining: Multi-stage training is achieved by changing only the config override—each stage's YAML points to the previous stage's checkpoint via model_name_or_path and draft_model_name_or_path.

The config hierarchy follows this structure:

  • conf/config.yaml — Root config
  • conf/hydra/default.yaml — Hydra behavior settings
  • conf/deepspeed/*.yaml — 17 DeepSpeed ZeRO/optimizer configs
  • conf/exp/*.yaml — 6 experiment configs (3 stages × 2 variants)

Usage

Use when defining or modifying GLIDE training experiments. The configuration system eliminates the need to modify Python code for different training setups:

  • Change model architecture: modify model._target_ in experiment YAML
  • Change data: modify file_path and collator references
  • Change optimization: swap DeepSpeed config reference
  • Change training stage: point to previous stage's output path

Theoretical Basis

Hydra's configuration composition follows the override pattern:

# Abstract Hydra flow (not actual implementation)
@hydra.main(config_path="conf", config_name="config", version_base="1.2")
def main(cfg: DictConfig):
    # cfg is fully resolved from YAML + CLI overrides
    model = hydra.utils.call(cfg.model, cfg.model_name_or_path)
    dataset = hydra.utils.instantiate(cfg.dataset)
    collator = hydra.utils.instantiate(cfg.collator)

Key configuration parameters across training stages:

Parameter Stage 1 Stage 2 Stage 3
max_seq_length 1024 32768 32768
learning_rate 5e-4 5e-6 5e-6
DeepSpeed ZeRO Stage 1 Stage 3 Stage 3
Data SlimPajama-6B Long-context data Long-CoT data
Collator DPODataSFTCollator LongDataNoMaskSFTCollator LongCoTDataSFTCollator

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment