Principle:Hpcaitech ColossalAI Continual Pretraining Loop

Knowledge Sources	ColossalAI
Domains	NLP, Training
Last Updated	2026-02-09 00:00 GMT

Overview

A training loop pattern for continual pretraining that supports both standard data-parallel and pipeline-parallel execution with gradient accumulation and gradient clipping.

Description

The continual pretraining training loop extends a pretrained LLM's knowledge by training on new domain-specific data. Unlike SFT (which uses a trainer class), the Colossal-LLaMA training loop is implemented directly in the training script with explicit handling of two execution paths: standard (using booster.backward()) and pipeline-parallel (using booster.execute_pipeline()).

Usage

Use this pattern for continual pretraining or domain adaptation of LLaMA models with ColossalAI.

Theoretical Basis

The training objective is standard causal language modeling:

$ℒ = - \sum_{t} \log P (x_{t} | x_{< t}; θ)$

Two execution paths:

Non-pipeline: Forward pass -> loss -> booster.backward() -> optimizer.step()
Pipeline: booster.execute_pipeline() handles both forward and backward with micro-batching

Related Pages

Implemented By

Implementation:Hpcaitech_ColossalAI_Booster_Training_Loop

Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment