Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Sail sg LongSpec Stage Progression Config

From Leeroopedia
Knowledge Sources
Domains Training, Configuration_Management
Last Updated 2026-02-14 05:00 GMT

Overview

Concrete tool for chaining GLIDE training stages through Hydra YAML configuration, where each stage's config references the previous stage's output checkpoint path.

Description

Stage progression is implemented entirely through YAML configuration chaining. Each stage's experiment config specifies:

  • model_name_or_path: The target LLM (unchanged across stages)
  • draft_model_name_or_path: Points to previous stage's draft_model_weights.pth output
  • Collator and data configuration: Appropriate for the stage's context length and data type
  • DeepSpeed configuration: ZeRO stage and optimizer settings appropriate for memory requirements

The trainer entry point (trainer_base_ds_mul_fs_tp.py) loads the draft weights at initialization time via the Qwen2Glide/LlamaGlide constructor's glide_path parameter.

Usage

Run each stage sequentially by selecting the appropriate experiment config. Users must update the draft_model_name_or_path in Stage 2/3 configs to point to the actual output path from the previous stage.

Code Reference

Source Location

  • Repository: LongSpec
  • File (Stage 1): longspec/train/conf/exp/qwq_glide_8gpu_slim6b.yaml
  • File (Stage 2): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask.yaml
  • File (Stage 3): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6.yaml
  • File (Draft loading): longspec/train/trainer_base_ds_mul_fs_tp.py
  • Lines: L408-410 (draft model weight loading in main())

Signature

# Pattern Doc: No single function signature.
# Stage progression is defined by YAML config chaining.

# Stage 1 (fresh draft model):
model_name_or_path: "Qwen/QwQ-32B-Preview"
# draft_model_name_or_path: not set (fresh initialization)

# Stage 2 (load Stage 1 draft):
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage1/draft_model_weights.pth"

# Stage 3 (load Stage 2 draft):
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage2/draft_model_weights.pth"

Import

# No Python import — configuration is YAML-based.
# Launch via:
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py +exp=<stage_config_name>

I/O Contract

Inputs

Name Type Required Description
+exp=<config> CLI argument Yes Experiment config selecting the training stage
draft_model_name_or_path str (in YAML) Stage 2+ Path to previous stage's draft_model_weights.pth
model_name_or_path str (in YAML) Yes HuggingFace path to target LLM (constant across stages)

Outputs

Name Type Description
draft_model_weights.pth File Trained draft model weights for this stage (input to next stage)
DeepSpeed checkpoints Directory Full training state for resume capability

Usage Examples

Full Three-Stage Pipeline

# Stage 1: Base training (1024 context, ZeRO-1)
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
    +exp=qwq_glide_8gpu_slim6b \
    output_dir=/output/stage1

# Stage 2: Long-context (32k context, ZeRO-3)
# First, update stage2 config: draft_model_name_or_path -> /output/stage1/draft_model_weights.pth
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
    +exp=qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask \
    output_dir=/output/stage2

# Stage 3: Long-CoT (32k context, ZeRO-3, CoT data)
# Update stage3 config: draft_model_name_or_path -> /output/stage2/draft_model_weights.pth
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
    +exp=qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6 \
    output_dir=/output/stage3

Sinkpi-Slicing Variant

# Alternative training variant available at each stage:
# Stage 1: qwq_glide_8gpu_slim6b_sinkpi-slicing
# Stage 2: qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask
# Stage 3: qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment