Implementation:Hpcaitech ColossalAI Booster Training Loop
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Training loop implementation for continual pretraining with dual execution paths (standard and pipeline), provided by Colossal-LLaMA.
Description
The training loop in train.py directly uses ColossalAI's Booster for training execution. For non-pipeline plugins (DDP, ZeRO, Gemini), it calls booster.backward() and optimizer.step() explicitly. For pipeline plugins (HybridParallel with pp_size > 1), it calls booster.execute_pipeline() which handles micro-batch scheduling internally.
Usage
Used in the Colossal-LLaMA training script for continual pretraining. The execution path is selected automatically based on the chosen plugin.
Code Reference
Source Location
- Repository: ColossalAI
- File: applications/Colossal-LLaMA/train.py
- Lines: 287-413
Signature
# Non-pipeline path (train.py:L343-413):
for batch in train_dataloader:
outputs = model(input_ids=batch["input_ids"], labels=batch["labels"])
loss = outputs.loss / accumulation_steps
booster.backward(loss, optimizer)
if (step + 1) % accumulation_steps == 0:
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# Pipeline path (train.py:L287-342):
outputs = booster.execute_pipeline(
data_iter=iter(dataloader),
model=model,
criterion=lambda outputs, inputs: outputs.loss,
optimizer=optimizer,
return_loss=True,
)
Import
from colossalai.booster import Booster
# The training loop is implemented in the script, not as an importable class
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | nn.Module | Yes | Boosted model |
| optimizer | Optimizer | Yes | Boosted optimizer |
| train_dataloader | DataLoader | Yes | Packed pretraining data |
| booster | Booster | Yes | ColossalAI Booster |
| accumulation_steps | int | No | Gradient accumulation (default: 1) |
| grad_clip | float | No | Max gradient norm for clipping |
Outputs
| Name | Type | Description |
|---|---|---|
| Trained model | nn.Module | Model with updated weights |
| Training logs | Dict | Loss, learning rate, throughput metrics |
| Checkpoints | Files | Periodic model/optimizer/scheduler checkpoints |
Usage Examples
# Standard non-pipeline training
for epoch in range(num_epochs):
for step, batch in enumerate(train_dataloader):
batch = {k: v.cuda() for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss / accumulation_steps
booster.backward(loss, optimizer)
if (step + 1) % accumulation_steps == 0:
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
if (step + 1) % save_interval == 0:
save_checkpoint(save_dir, booster, model, optimizer,
lr_scheduler, epoch, step, batch_size, coordinator)