Principle:Sail sg LongSpec Hydra Configuration

Knowledge Sources	Hydra Documentation LongSpec
Domains	Configuration_Management, Training
Last Updated	2026-02-14 05:00 GMT

Overview

Configuration management principle using Hydra's composable YAML system to define training experiments with hierarchical overrides and object instantiation.

Description

Hydra Configuration in LongSpec defines all training parameters—model architecture, data paths, optimizer settings, DeepSpeed config, and training hyperparameters—in composable YAML files. The key benefits are:

Composability: Experiment configs inherit from and override base configs. A single CLI argument (+exp=config_name) selects the full experiment setup.
Object instantiation: The _target_ key in YAML specifies Python class paths, allowing Hydra to construct model, dataset, collator, and aligner objects directly from config.
Stage chaining: Multi-stage training is achieved by changing only the config override—each stage's YAML points to the previous stage's checkpoint via model_name_or_path and draft_model_name_or_path.

The config hierarchy follows this structure:

conf/config.yaml — Root config
conf/hydra/default.yaml — Hydra behavior settings
conf/deepspeed/*.yaml — 17 DeepSpeed ZeRO/optimizer configs
conf/exp/*.yaml — 6 experiment configs (3 stages × 2 variants)

Usage

Use when defining or modifying GLIDE training experiments. The configuration system eliminates the need to modify Python code for different training setups:

Change model architecture: modify model._target_ in experiment YAML
Change data: modify file_path and collator references
Change optimization: swap DeepSpeed config reference
Change training stage: point to previous stage's output path

Theoretical Basis

Hydra's configuration composition follows the override pattern:

# Abstract Hydra flow (not actual implementation)
@hydra.main(config_path="conf", config_name="config", version_base="1.2")
def main(cfg: DictConfig):
    # cfg is fully resolved from YAML + CLI overrides
    model = hydra.utils.call(cfg.model, cfg.model_name_or_path)
    dataset = hydra.utils.instantiate(cfg.dataset)
    collator = hydra.utils.instantiate(cfg.collator)

Key configuration parameters across training stages:

Parameter	Stage 1	Stage 2	Stage 3
max_seq_length	1024	32768	32768
learning_rate	5e-4	5e-6	5e-6
DeepSpeed ZeRO	Stage 1	Stage 3	Stage 3
Data	SlimPajama-6B	Long-context data	Long-CoT data
Collator	DPODataSFTCollator	LongDataNoMaskSFTCollator	LongCoTDataSFTCollator

Related Pages

Implemented By

Implementation:Sail_sg_LongSpec_Hydra_YAML_Composition

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment