Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Hpcaitech ColossalAI SFT Training Execution

From Leeroopedia


Knowledge Sources
Domains NLP, Training
Last Updated 2026-02-09 00:00 GMT

Overview

A supervised learning training loop that fine-tunes a language model on instruction-response pairs using next-token prediction loss with loss masking.

Description

SFT Training Execution implements the standard supervised fine-tuning loop for language models. The model learns to generate appropriate responses by minimizing the cross-entropy loss on response tokens while ignoring instruction/prompt tokens through a loss mask. The training loop supports gradient accumulation, distributed training via ColossalAI's Booster, periodic evaluation, and checkpoint saving.

The trainer handles both standard data-parallel training (using booster.backward()) and pipeline-parallel training (using booster.execute_pipeline()).

Usage

Use this principle after all components (model, optimizer, dataloader, booster) are configured. It is the core training loop for instruction-tuning language models and produces a fine-tuned model checkpoint.

Theoretical Basis

The SFT objective minimizes the conditional language modeling loss:

SFT=tmtlogP(xt|x<t;θ)

Where mt is the loss mask (1 for response tokens, 0 for prompt tokens).

Training loop pseudo-code:

for epoch in range(max_epochs):
    for step, batch in enumerate(train_dataloader):
        loss = model(input_ids, labels, loss_mask)
        booster.backward(loss / accumulation_steps, optimizer)
        if (step + 1) % accumulation_steps == 0:
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment