Implementation:Sail sg LongSpec Stage Progression Config

Knowledge Sources	LongSpec
Domains	Training, Configuration_Management
Last Updated	2026-02-14 05:00 GMT

Overview

Concrete tool for chaining GLIDE training stages through Hydra YAML configuration, where each stage's config references the previous stage's output checkpoint path.

Description

Stage progression is implemented entirely through YAML configuration chaining. Each stage's experiment config specifies:

model_name_or_path: The target LLM (unchanged across stages)
draft_model_name_or_path: Points to previous stage's draft_model_weights.pth output
Collator and data configuration: Appropriate for the stage's context length and data type
DeepSpeed configuration: ZeRO stage and optimizer settings appropriate for memory requirements

The trainer entry point (trainer_base_ds_mul_fs_tp.py) loads the draft weights at initialization time via the Qwen2Glide/LlamaGlide constructor's glide_path parameter.

Usage

Run each stage sequentially by selecting the appropriate experiment config. Users must update the draft_model_name_or_path in Stage 2/3 configs to point to the actual output path from the previous stage.

Code Reference

Source Location

Repository: LongSpec
File (Stage 1): longspec/train/conf/exp/qwq_glide_8gpu_slim6b.yaml
File (Stage 2): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask.yaml
File (Stage 3): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6.yaml
File (Draft loading): longspec/train/trainer_base_ds_mul_fs_tp.py
Lines: L408-410 (draft model weight loading in main())

Signature

# Pattern Doc: No single function signature.
# Stage progression is defined by YAML config chaining.

# Stage 1 (fresh draft model):
model_name_or_path: "Qwen/QwQ-32B-Preview"
# draft_model_name_or_path: not set (fresh initialization)

# Stage 2 (load Stage 1 draft):
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage1/draft_model_weights.pth"

# Stage 3 (load Stage 2 draft):
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage2/draft_model_weights.pth"

Import

# No Python import — configuration is YAML-based.
# Launch via:
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py +exp=<stage_config_name>

I/O Contract

Inputs

Name	Type	Required	Description
+exp=<config>	CLI argument	Yes	Experiment config selecting the training stage
draft_model_name_or_path	str (in YAML)	Stage 2+	Path to previous stage's draft_model_weights.pth
model_name_or_path	str (in YAML)	Yes	HuggingFace path to target LLM (constant across stages)

Outputs

Name	Type	Description
draft_model_weights.pth	File	Trained draft model weights for this stage (input to next stage)
DeepSpeed checkpoints	Directory	Full training state for resume capability

Usage Examples

Full Three-Stage Pipeline

# Stage 1: Base training (1024 context, ZeRO-1)
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
    +exp=qwq_glide_8gpu_slim6b \
    output_dir=/output/stage1

# Stage 2: Long-context (32k context, ZeRO-3)
# First, update stage2 config: draft_model_name_or_path -> /output/stage1/draft_model_weights.pth
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
    +exp=qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask \
    output_dir=/output/stage2

# Stage 3: Long-CoT (32k context, ZeRO-3, CoT data)
# Update stage3 config: draft_model_name_or_path -> /output/stage2/draft_model_weights.pth
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
    +exp=qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6 \
    output_dir=/output/stage3

Sinkpi-Slicing Variant

# Alternative training variant available at each stage:
# Stage 1: qwq_glide_8gpu_slim6b_sinkpi-slicing
# Stage 2: qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask
# Stage 3: qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6

Related Pages

Implements Principle

Principle:Sail_sg_LongSpec_Multi_Stage_Training

Requires Environment

Environment:Sail_sg_LongSpec_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment