Principle:Sail sg LongSpec Hydra Configuration
| Knowledge Sources | |
|---|---|
| Domains | Configuration_Management, Training |
| Last Updated | 2026-02-14 05:00 GMT |
Overview
Configuration management principle using Hydra's composable YAML system to define training experiments with hierarchical overrides and object instantiation.
Description
Hydra Configuration in LongSpec defines all training parameters—model architecture, data paths, optimizer settings, DeepSpeed config, and training hyperparameters—in composable YAML files. The key benefits are:
- Composability: Experiment configs inherit from and override base configs. A single CLI argument (+exp=config_name) selects the full experiment setup.
- Object instantiation: The _target_ key in YAML specifies Python class paths, allowing Hydra to construct model, dataset, collator, and aligner objects directly from config.
- Stage chaining: Multi-stage training is achieved by changing only the config override—each stage's YAML points to the previous stage's checkpoint via model_name_or_path and draft_model_name_or_path.
The config hierarchy follows this structure:
- conf/config.yaml — Root config
- conf/hydra/default.yaml — Hydra behavior settings
- conf/deepspeed/*.yaml — 17 DeepSpeed ZeRO/optimizer configs
- conf/exp/*.yaml — 6 experiment configs (3 stages × 2 variants)
Usage
Use when defining or modifying GLIDE training experiments. The configuration system eliminates the need to modify Python code for different training setups:
- Change model architecture: modify model._target_ in experiment YAML
- Change data: modify file_path and collator references
- Change optimization: swap DeepSpeed config reference
- Change training stage: point to previous stage's output path
Theoretical Basis
Hydra's configuration composition follows the override pattern:
# Abstract Hydra flow (not actual implementation)
@hydra.main(config_path="conf", config_name="config", version_base="1.2")
def main(cfg: DictConfig):
# cfg is fully resolved from YAML + CLI overrides
model = hydra.utils.call(cfg.model, cfg.model_name_or_path)
dataset = hydra.utils.instantiate(cfg.dataset)
collator = hydra.utils.instantiate(cfg.collator)
Key configuration parameters across training stages:
| Parameter | Stage 1 | Stage 2 | Stage 3 |
|---|---|---|---|
| max_seq_length | 1024 | 32768 | 32768 |
| learning_rate | 5e-4 | 5e-6 | 5e-6 |
| DeepSpeed ZeRO | Stage 1 | Stage 3 | Stage 3 |
| Data | SlimPajama-6B | Long-context data | Long-CoT data |
| Collator | DPODataSFTCollator | LongDataNoMaskSFTCollator | LongCoTDataSFTCollator |