Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba ROLL SFT Configuration

From Leeroopedia


Knowledge Sources
Domains Supervised_Learning, Configuration
Last Updated 2026-02-07 20:00 GMT

Overview

A configuration principle for setting up supervised fine-tuning of LLMs on instruction-response datasets with distributed training support.

Description

SFT Configuration manages the hyperparameters for supervised fine-tuning, including model path, dataset field mappings (instruction/output keys), training hyperparameters (learning rate, batch size, gradient accumulation), and distributed training strategy selection (Megatron, DeepSpeed, FSDP2).

Usage

Use when setting up an SFT training pipeline to fine-tune an LLM on instruction-response data.

Theoretical Basis

SFT minimizes cross-entropy loss on response tokens: L=t𝟙[tresponse]logPθ(yt|y<t,x)

Prompt tokens are masked with IGNORE_INDEX (-100) so only response tokens contribute to the loss.

Related Pages

Implemented By

Related Heuristics

No specific heuristics inform this principle.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment