Principle:Alibaba ROLL SFT Configuration

Knowledge Sources	HuggingFace SFT Alibaba ROLL
Domains	Supervised_Learning, Configuration
Last Updated	2026-02-07 20:00 GMT

Overview

A configuration principle for setting up supervised fine-tuning of LLMs on instruction-response datasets with distributed training support.

Description

SFT Configuration manages the hyperparameters for supervised fine-tuning, including model path, dataset field mappings (instruction/output keys), training hyperparameters (learning rate, batch size, gradient accumulation), and distributed training strategy selection (Megatron, DeepSpeed, FSDP2).

Usage

Use when setting up an SFT training pipeline to fine-tune an LLM on instruction-response data.

Theoretical Basis

SFT minimizes cross-entropy loss on response tokens: $L = - \sum_{t} 𝟙 [t \in response] \log P_{θ} (y_{t} | y_{< t}, x)$

Prompt tokens are masked with IGNORE_INDEX (-100) so only response tokens contribute to the loss.

Related Pages

Implemented By

Implementation:Alibaba_ROLL_SFTConfig

Related Heuristics

No specific heuristics inform this principle.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment