Principle:OpenRLHF OpenRLHF Supervised Fine Tuning Training
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A training methodology that fine-tunes a pretrained language model on instruction-response demonstrations using supervised cross-entropy loss on response tokens.
Description
Supervised Fine-Tuning (SFT) is typically the first stage of RLHF pipelines. It adapts a pretrained language model to follow instructions by training on curated demonstration data. The model learns to generate appropriate responses to prompts by minimizing the negative log-likelihood of response tokens, with prompt tokens masked from the loss.
SFT provides the initial policy for subsequent alignment stages (reward model training, PPO/DPO). The quality and diversity of the SFT dataset directly impacts the final aligned model's capabilities.
Usage
Use SFT as the starting point for any RLHF pipeline, or as a standalone training method when sufficient high-quality demonstration data is available. Also used in iterative training loops (rejection sampling, iterative DPO) to retrain on filtered data.
Theoretical Basis
The SFT objective minimizes token-level negative log-likelihood on response tokens:
where is the set of response token indices and is the model's output distribution.
OpenRLHF supports two loss computation modes:
- Token-level: Average loss over all unmasked tokens across the batch
- Sequence-level: Average per-sequence loss, then average over the batch