Principle:Hpcaitech ColossalAI SFT Training Execution

Knowledge Sources	ColossalAI Training language models to follow instructions
Domains	NLP, Training
Last Updated	2026-02-09 00:00 GMT

Overview

A supervised learning training loop that fine-tunes a language model on instruction-response pairs using next-token prediction loss with loss masking.

Description

SFT Training Execution implements the standard supervised fine-tuning loop for language models. The model learns to generate appropriate responses by minimizing the cross-entropy loss on response tokens while ignoring instruction/prompt tokens through a loss mask. The training loop supports gradient accumulation, distributed training via ColossalAI's Booster, periodic evaluation, and checkpoint saving.

The trainer handles both standard data-parallel training (using booster.backward()) and pipeline-parallel training (using booster.execute_pipeline()).

Usage

Use this principle after all components (model, optimizer, dataloader, booster) are configured. It is the core training loop for instruction-tuning language models and produces a fine-tuned model checkpoint.

Theoretical Basis

The SFT objective minimizes the conditional language modeling loss:

$ℒ_{S F T} = - \sum_{t} m_{t} \log P (x_{t} | x_{< t}; θ)$

Where $m_{t}$ is the loss mask (1 for response tokens, 0 for prompt tokens).

Training loop pseudo-code:

for epoch in range(max_epochs):
    for step, batch in enumerate(train_dataloader):
        loss = model(input_ids, labels, loss_mask)
        booster.backward(loss / accumulation_steps, optimizer)
        if (step + 1) % accumulation_steps == 0:
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()

Related Pages

Implemented By

Implementation:Hpcaitech_ColossalAI_SFTTrainer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment