Principle:Alibaba ROLL Supervised Training Loop
| Knowledge Sources | |
|---|---|
| Domains | Supervised_Learning, Distributed_Training |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A supervised training principle for fine-tuning LLMs on instruction-response data with cross-entropy loss and distributed gradient computation.
Description
The Supervised Training Loop iterates over batched instruction-response data, computing cross-entropy loss on response tokens only (prompt tokens are masked). The training step handles gradient computation, accumulation across micro-batches, optimizer stepping, and learning rate scheduling through the configured distributed training strategy.
Usage
Use as the core training loop for SFT pipelines. Each step processes a batch through forward-backward pass and parameter update.
Theoretical Basis
The SFT objective minimizes next-token prediction loss:
Where R is the set of response token positions (non-masked).
Related Pages
Implemented By
Related Heuristics
The following heuristics inform this principle: