Principle:Hpcaitech ColossalAI Continual Pretraining Loop
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A training loop pattern for continual pretraining that supports both standard data-parallel and pipeline-parallel execution with gradient accumulation and gradient clipping.
Description
The continual pretraining training loop extends a pretrained LLM's knowledge by training on new domain-specific data. Unlike SFT (which uses a trainer class), the Colossal-LLaMA training loop is implemented directly in the training script with explicit handling of two execution paths: standard (using booster.backward()) and pipeline-parallel (using booster.execute_pipeline()).
Usage
Use this pattern for continual pretraining or domain adaptation of LLaMA models with ColossalAI.
Theoretical Basis
The training objective is standard causal language modeling:
Two execution paths:
- Non-pipeline: Forward pass -> loss -> booster.backward() -> optimizer.step()
- Pipeline: booster.execute_pipeline() handles both forward and backward with micro-batching