Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Sail sg LongSpec Hydra YAML Composition

From Leeroopedia
Knowledge Sources
Domains Configuration_Management, Training
Last Updated 2026-02-14 05:00 GMT

Overview

Concrete tool for defining GLIDE training experiments using Hydra YAML composition with experiment configs, DeepSpeed presets, and object instantiation via _target_ keys.

Description

The LongSpec configuration system uses Hydra with OmegaConf to compose training experiments from YAML files. Each experiment config (in conf/exp/) defines the complete training setup including model class, data paths, collator, optimizer, and DeepSpeed config. The @hydra.main decorator on the trainer entry point resolves all configs into a single DictConfig object.

Usage

Select an experiment by passing the config name as a CLI argument:

deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py +exp=qwq_glide_8gpu_slim6b

Code Reference

Source Location

  • Repository: LongSpec
  • File (Stage 1): longspec/train/conf/exp/qwq_glide_8gpu_slim6b.yaml
  • Lines: L1-199
  • File (Stage 2): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask.yaml
  • Lines: L1-177
  • File (Stage 3): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6.yaml
  • Lines: L1-177
  • File (Hydra entry): longspec/train/trainer_base_ds_mul_fs_tp.py
  • Lines: L337 (@hydra.main decorator)

Signature

# Hydra entry point (Pattern Doc - no single function signature)
@hydra.main(config_path="conf", config_name="config", version_base="1.2")
def main(cfg: DictConfig):
    """
    Training entry point. Hydra resolves cfg from:
      1. conf/config.yaml (base)
      2. +exp=<name> (experiment override)
      3. CLI overrides (e.g., learning_rate=1e-5)
    """

Import

import hydra
from omegaconf import DictConfig, OmegaConf

I/O Contract

Inputs

Name Type Required Description
+exp=<name> CLI argument Yes Selects experiment YAML from conf/exp/ directory
CLI overrides key=value pairs No Override any config parameter from command line

Outputs

Name Type Description
cfg DictConfig Fully resolved configuration object with all training parameters
cfg.model DictConfig Model instantiation config with _target_ pointing to Qwen2Glide/LlamaGlide
cfg.model_name_or_path str HuggingFace model path for target LLM
cfg.per_gpu_train_batch_size int Per-GPU micro-batch size
cfg.gradient_accumulation_steps int Number of micro-batches before optimizer step
cfg.learning_rate float Peak learning rate
cfg.output_dir str Directory for checkpoints and logs

Usage Examples

Stage 1 Config Structure

# qwq_glide_8gpu_slim6b.yaml (key sections):
model:
  _target_: models.qwen2_glide.Qwen2Glide.from_pretrained
model_name_or_path: "Qwen/QwQ-32B-Preview"

# Training hyperparameters
per_gpu_train_batch_size: 2
gradient_accumulation_steps: 128
learning_rate: 5e-4
num_train_epochs: 1
warmup_proportion: 0.1

# Data
dataset:
  _target_: data.combine_dataset.MultiMappingDataset
  read_fn:
    _target_: data.input_utils.jsonl_read_fn

# Collator
collator:
  _target_: data.general_collator.DPODataSFTCollator
  max_seq_length: 1024

# DeepSpeed
ds_config: conf/deepspeed/train_hybrid_engine_zero1_optim_offload_cosine.yaml

Stage 2 Config (Long-Context)

# Key differences from Stage 1:
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage1/draft_model_weights.pth"  # From Stage 1

learning_rate: 5e-6  # Lower LR
collator:
  _target_: data.general_collator.LongDataNoMaskSFTCollator
  max_seq_length: 32768  # 32x longer context

ds_config: conf/deepspeed/train_hybrid_engine_zero3_optim_offload_cosine.yaml  # ZeRO-3

CLI Override Examples

# Override learning rate:
deepspeed trainer_base_ds_mul_fs_tp.py +exp=qwq_glide_8gpu_slim6b learning_rate=1e-4

# Override output directory:
deepspeed trainer_base_ds_mul_fs_tp.py +exp=qwq_glide_8gpu_slim6b output_dir=/new/path

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment