Principle:CarperAI Trlx Supervised Fine Tuning
| Knowledge Sources | |
|---|---|
| Domains | Supervised_Learning, NLP, Training |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
A training principle for fine-tuning language models on curated text or instruction-following datasets using the standard next-token prediction objective.
Description
Supervised Fine-Tuning (SFT) adapts a pre-trained language model to a specific task or style by training on demonstration data with cross-entropy loss. In the RLHF pipeline, SFT is the first stage that teaches the model the basic format and quality of desired outputs before RL optimization refines it further. SFT can also be used standalone for instruction tuning, domain adaptation, or style transfer.
trlx supports two SFT data formats: plain text strings (where the entire sequence is used as training target) and prompt-completion pairs (where loss is masked to only compute on completion tokens). The latter uses a DialogStore that tracks which tokens are prompt vs. output via a dialogue tokenization scheme.
Usage
Use SFT when you have high-quality demonstration data and want the model to learn to produce similar outputs. SFT is appropriate as: (1) the first stage of an RLHF pipeline, (2) a standalone instruction-tuning method, (3) domain adaptation with in-domain text. Use SFT over RL when you have enough demonstration data and do not need to optimize for a specific reward signal.
Theoretical Basis
SFT minimizes the negative log-likelihood of target tokens:
Where for completion tokens and (label = -100) for prompt tokens in dialogue format.
Two data modes in trlx:
- Plain text: List[str] → All tokens used as targets via PromptPipeline
- Dialogue pairs: List[List[str]] → Alternating [prompt, output, prompt, output, ...] → Loss masked on prompts via DialogStore