Implementation:Sail sg LongSpec Hydra YAML Composition
| Knowledge Sources | |
|---|---|
| Domains | Configuration_Management, Training |
| Last Updated | 2026-02-14 05:00 GMT |
Overview
Concrete tool for defining GLIDE training experiments using Hydra YAML composition with experiment configs, DeepSpeed presets, and object instantiation via _target_ keys.
Description
The LongSpec configuration system uses Hydra with OmegaConf to compose training experiments from YAML files. Each experiment config (in conf/exp/) defines the complete training setup including model class, data paths, collator, optimizer, and DeepSpeed config. The @hydra.main decorator on the trainer entry point resolves all configs into a single DictConfig object.
Usage
Select an experiment by passing the config name as a CLI argument:
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py +exp=qwq_glide_8gpu_slim6b
Code Reference
Source Location
- Repository: LongSpec
- File (Stage 1): longspec/train/conf/exp/qwq_glide_8gpu_slim6b.yaml
- Lines: L1-199
- File (Stage 2): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask.yaml
- Lines: L1-177
- File (Stage 3): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6.yaml
- Lines: L1-177
- File (Hydra entry): longspec/train/trainer_base_ds_mul_fs_tp.py
- Lines: L337 (@hydra.main decorator)
Signature
# Hydra entry point (Pattern Doc - no single function signature)
@hydra.main(config_path="conf", config_name="config", version_base="1.2")
def main(cfg: DictConfig):
"""
Training entry point. Hydra resolves cfg from:
1. conf/config.yaml (base)
2. +exp=<name> (experiment override)
3. CLI overrides (e.g., learning_rate=1e-5)
"""
Import
import hydra
from omegaconf import DictConfig, OmegaConf
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| +exp=<name> | CLI argument | Yes | Selects experiment YAML from conf/exp/ directory |
| CLI overrides | key=value pairs | No | Override any config parameter from command line |
Outputs
| Name | Type | Description |
|---|---|---|
| cfg | DictConfig | Fully resolved configuration object with all training parameters |
| cfg.model | DictConfig | Model instantiation config with _target_ pointing to Qwen2Glide/LlamaGlide |
| cfg.model_name_or_path | str | HuggingFace model path for target LLM |
| cfg.per_gpu_train_batch_size | int | Per-GPU micro-batch size |
| cfg.gradient_accumulation_steps | int | Number of micro-batches before optimizer step |
| cfg.learning_rate | float | Peak learning rate |
| cfg.output_dir | str | Directory for checkpoints and logs |
Usage Examples
Stage 1 Config Structure
# qwq_glide_8gpu_slim6b.yaml (key sections):
model:
_target_: models.qwen2_glide.Qwen2Glide.from_pretrained
model_name_or_path: "Qwen/QwQ-32B-Preview"
# Training hyperparameters
per_gpu_train_batch_size: 2
gradient_accumulation_steps: 128
learning_rate: 5e-4
num_train_epochs: 1
warmup_proportion: 0.1
# Data
dataset:
_target_: data.combine_dataset.MultiMappingDataset
read_fn:
_target_: data.input_utils.jsonl_read_fn
# Collator
collator:
_target_: data.general_collator.DPODataSFTCollator
max_seq_length: 1024
# DeepSpeed
ds_config: conf/deepspeed/train_hybrid_engine_zero1_optim_offload_cosine.yaml
Stage 2 Config (Long-Context)
# Key differences from Stage 1:
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage1/draft_model_weights.pth" # From Stage 1
learning_rate: 5e-6 # Lower LR
collator:
_target_: data.general_collator.LongDataNoMaskSFTCollator
max_seq_length: 32768 # 32x longer context
ds_config: conf/deepspeed/train_hybrid_engine_zero3_optim_offload_cosine.yaml # ZeRO-3
CLI Override Examples
# Override learning rate:
deepspeed trainer_base_ds_mul_fs_tp.py +exp=qwq_glide_8gpu_slim6b learning_rate=1e-4
# Override output directory:
deepspeed trainer_base_ds_mul_fs_tp.py +exp=qwq_glide_8gpu_slim6b output_dir=/new/path