Principle:Alibaba ROLL SFT Configuration
| Knowledge Sources | |
|---|---|
| Domains | Supervised_Learning, Configuration |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A configuration principle for setting up supervised fine-tuning of LLMs on instruction-response datasets with distributed training support.
Description
SFT Configuration manages the hyperparameters for supervised fine-tuning, including model path, dataset field mappings (instruction/output keys), training hyperparameters (learning rate, batch size, gradient accumulation), and distributed training strategy selection (Megatron, DeepSpeed, FSDP2).
Usage
Use when setting up an SFT training pipeline to fine-tune an LLM on instruction-response data.
Theoretical Basis
SFT minimizes cross-entropy loss on response tokens:
Prompt tokens are masked with IGNORE_INDEX (-100) so only response tokens contribute to the loss.
Related Pages
Implemented By
Related Heuristics
No specific heuristics inform this principle.