Principle:Huggingface Alignment handbook Supervised Finetuning

Knowledge Sources	Alignment Handbook Training language models to follow instructions TRL SFTTrainer
Domains	NLP, Deep_Learning, Training
Last Updated	2026-02-07 00:00 GMT

Overview

A training technique that adapts a pretrained language model to follow instructions by training on curated demonstration data with a standard cross-entropy language modeling objective.

Description

Supervised Fine-Tuning (SFT) is the first stage of the RLHF alignment pipeline. It takes a pretrained base model and trains it on high-quality instruction-response pairs to teach the model to follow human instructions. The training uses standard next-token prediction (causal language modeling) on formatted conversation data.

SFT addresses the gap between a pretrained model's capability (predicting next tokens in web text) and the desired behavior (following user instructions helpfully and safely). By training on curated demonstrations, the model learns the expected input-output format and develops instruction-following ability.

In the alignment-handbook, SFT serves as the foundation for subsequent preference optimization stages (DPO, ORPO). The SFT checkpoint becomes the starting point for preference learning.

Usage

Use supervised fine-tuning when:

Adapting a base pretrained model to follow conversational instructions
Creating the first stage of a multi-stage alignment pipeline (SFT → DPO)
Fine-tuning on domain-specific instruction data
The training data consists of demonstration conversations with clear input-output pairs

Theoretical Basis

SFT minimizes the standard cross-entropy loss over the training data:

$ℒ_{S F T} = - \sum_{t = 1}^{T} \log P_{θ} (x_{t} | x_{< t})$

Where $x_{t}$ is the token at position t and $θ$ are the model parameters. When assistant_only_loss is enabled, the loss is computed only over assistant response tokens, not the prompt/user tokens:

# Abstract SFT algorithm (NOT real implementation)
for batch in training_data:
    tokens = tokenize(format_conversation(batch))
    if assistant_only_loss:
        loss_mask = create_assistant_mask(tokens)
    else:
        loss_mask = ones_like(tokens)
    logits = model(tokens)
    loss = cross_entropy(logits, tokens, mask=loss_mask)
    loss.backward()
    optimizer.step()

Key training features in the alignment-handbook:

Sequence packing: Multiple short conversations are packed into a single sequence to maximize GPU utilization
Gradient checkpointing: Trades compute for memory by recomputing activations during backward pass
Chat template formatting: Conversations are formatted using Jinja2 templates before tokenization

Related Pages

Implemented By

Implementation:Huggingface_Alignment_handbook_SFTTrainer_Usage

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment