Principle:Hpcaitech ColossalAI SFT Training Execution
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A supervised learning training loop that fine-tunes a language model on instruction-response pairs using next-token prediction loss with loss masking.
Description
SFT Training Execution implements the standard supervised fine-tuning loop for language models. The model learns to generate appropriate responses by minimizing the cross-entropy loss on response tokens while ignoring instruction/prompt tokens through a loss mask. The training loop supports gradient accumulation, distributed training via ColossalAI's Booster, periodic evaluation, and checkpoint saving.
The trainer handles both standard data-parallel training (using booster.backward()) and pipeline-parallel training (using booster.execute_pipeline()).
Usage
Use this principle after all components (model, optimizer, dataloader, booster) are configured. It is the core training loop for instruction-tuning language models and produces a fine-tuned model checkpoint.
Theoretical Basis
The SFT objective minimizes the conditional language modeling loss:
Where is the loss mask (1 for response tokens, 0 for prompt tokens).
Training loop pseudo-code:
for epoch in range(max_epochs):
for step, batch in enumerate(train_dataloader):
loss = model(input_ids, labels, loss_mask)
booster.backward(loss / accumulation_steps, optimizer)
if (step + 1) % accumulation_steps == 0:
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()