Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:OpenRLHF OpenRLHF Supervised Fine Tuning Training

From Leeroopedia


Knowledge Sources
Domains NLP, Training
Last Updated 2026-02-07 00:00 GMT

Overview

A training methodology that fine-tunes a pretrained language model on instruction-response demonstrations using supervised cross-entropy loss on response tokens.

Description

Supervised Fine-Tuning (SFT) is typically the first stage of RLHF pipelines. It adapts a pretrained language model to follow instructions by training on curated demonstration data. The model learns to generate appropriate responses to prompts by minimizing the negative log-likelihood of response tokens, with prompt tokens masked from the loss.

SFT provides the initial policy for subsequent alignment stages (reward model training, PPO/DPO). The quality and diversity of the SFT dataset directly impacts the final aligned model's capabilities.

Usage

Use SFT as the starting point for any RLHF pipeline, or as a standalone training method when sufficient high-quality demonstration data is available. Also used in iterative training loops (rejection sampling, iterative DPO) to retrain on filtered data.

Theoretical Basis

The SFT objective minimizes token-level negative log-likelihood on response tokens: LSFT=1|R|tRlogπθ(xt|x<t)

where R is the set of response token indices and πθ is the model's output distribution.

OpenRLHF supports two loss computation modes:

  • Token-level: Average loss over all unmasked tokens across the batch
  • Sequence-level: Average per-sequence loss, then average over the batch

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment