Implementation:Sail sg LongSpec Stage Progression Config
| Knowledge Sources | |
|---|---|
| Domains | Training, Configuration_Management |
| Last Updated | 2026-02-14 05:00 GMT |
Overview
Concrete tool for chaining GLIDE training stages through Hydra YAML configuration, where each stage's config references the previous stage's output checkpoint path.
Description
Stage progression is implemented entirely through YAML configuration chaining. Each stage's experiment config specifies:
- model_name_or_path: The target LLM (unchanged across stages)
- draft_model_name_or_path: Points to previous stage's draft_model_weights.pth output
- Collator and data configuration: Appropriate for the stage's context length and data type
- DeepSpeed configuration: ZeRO stage and optimizer settings appropriate for memory requirements
The trainer entry point (trainer_base_ds_mul_fs_tp.py) loads the draft weights at initialization time via the Qwen2Glide/LlamaGlide constructor's glide_path parameter.
Usage
Run each stage sequentially by selecting the appropriate experiment config. Users must update the draft_model_name_or_path in Stage 2/3 configs to point to the actual output path from the previous stage.
Code Reference
Source Location
- Repository: LongSpec
- File (Stage 1): longspec/train/conf/exp/qwq_glide_8gpu_slim6b.yaml
- File (Stage 2): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask.yaml
- File (Stage 3): longspec/train/conf/exp/qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6.yaml
- File (Draft loading): longspec/train/trainer_base_ds_mul_fs_tp.py
- Lines: L408-410 (draft model weight loading in main())
Signature
# Pattern Doc: No single function signature.
# Stage progression is defined by YAML config chaining.
# Stage 1 (fresh draft model):
model_name_or_path: "Qwen/QwQ-32B-Preview"
# draft_model_name_or_path: not set (fresh initialization)
# Stage 2 (load Stage 1 draft):
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage1/draft_model_weights.pth"
# Stage 3 (load Stage 2 draft):
model_name_or_path: "Qwen/QwQ-32B-Preview"
draft_model_name_or_path: "/output/stage2/draft_model_weights.pth"
Import
# No Python import — configuration is YAML-based.
# Launch via:
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py +exp=<stage_config_name>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| +exp=<config> | CLI argument | Yes | Experiment config selecting the training stage |
| draft_model_name_or_path | str (in YAML) | Stage 2+ | Path to previous stage's draft_model_weights.pth |
| model_name_or_path | str (in YAML) | Yes | HuggingFace path to target LLM (constant across stages) |
Outputs
| Name | Type | Description |
|---|---|---|
| draft_model_weights.pth | File | Trained draft model weights for this stage (input to next stage) |
| DeepSpeed checkpoints | Directory | Full training state for resume capability |
Usage Examples
Full Three-Stage Pipeline
# Stage 1: Base training (1024 context, ZeRO-1)
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
+exp=qwq_glide_8gpu_slim6b \
output_dir=/output/stage1
# Stage 2: Long-context (32k context, ZeRO-3)
# First, update stage2 config: draft_model_name_or_path -> /output/stage1/draft_model_weights.pth
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
+exp=qwq_glide_8gpu_slim6b_longv2-32k-zero3_5e-6-ligce-nomask \
output_dir=/output/stage2
# Stage 3: Long-CoT (32k context, ZeRO-3, CoT data)
# Update stage3 config: draft_model_name_or_path -> /output/stage2/draft_model_weights.pth
deepspeed --num_gpus=8 trainer_base_ds_mul_fs_tp.py \
+exp=qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6 \
output_dir=/output/stage3
Sinkpi-Slicing Variant
# Alternative training variant available at each stage:
# Stage 1: qwq_glide_8gpu_slim6b_sinkpi-slicing
# Stage 2: qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask
# Stage 3: qwq_glide_8gpu_slim6b_sinkpi-slicing_longv2-32k-zero3_5e-6-ligce-nomask_longcot_5e-6