Principle:Huggingface Alignment handbook Supervised Finetuning
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A training technique that adapts a pretrained language model to follow instructions by training on curated demonstration data with a standard cross-entropy language modeling objective.
Description
Supervised Fine-Tuning (SFT) is the first stage of the RLHF alignment pipeline. It takes a pretrained base model and trains it on high-quality instruction-response pairs to teach the model to follow human instructions. The training uses standard next-token prediction (causal language modeling) on formatted conversation data.
SFT addresses the gap between a pretrained model's capability (predicting next tokens in web text) and the desired behavior (following user instructions helpfully and safely). By training on curated demonstrations, the model learns the expected input-output format and develops instruction-following ability.
In the alignment-handbook, SFT serves as the foundation for subsequent preference optimization stages (DPO, ORPO). The SFT checkpoint becomes the starting point for preference learning.
Usage
Use supervised fine-tuning when:
- Adapting a base pretrained model to follow conversational instructions
- Creating the first stage of a multi-stage alignment pipeline (SFT → DPO)
- Fine-tuning on domain-specific instruction data
- The training data consists of demonstration conversations with clear input-output pairs
Theoretical Basis
SFT minimizes the standard cross-entropy loss over the training data:
Where is the token at position t and are the model parameters. When assistant_only_loss is enabled, the loss is computed only over assistant response tokens, not the prompt/user tokens:
# Abstract SFT algorithm (NOT real implementation)
for batch in training_data:
tokens = tokenize(format_conversation(batch))
if assistant_only_loss:
loss_mask = create_assistant_mask(tokens)
else:
loss_mask = ones_like(tokens)
logits = model(tokens)
loss = cross_entropy(logits, tokens, mask=loss_mask)
loss.backward()
optimizer.step()
Key training features in the alignment-handbook:
- Sequence packing: Multiple short conversations are packed into a single sequence to maximize GPU utilization
- Gradient checkpointing: Trades compute for memory by recomputing activations during backward pass
- Chat template formatting: Conversations are formatted using Jinja2 templates before tokenization