Principle:Alibaba ROLL Supervised Training Loop

Knowledge Sources	PyTorch Training Alibaba ROLL
Domains	Supervised_Learning, Distributed_Training
Last Updated	2026-02-07 20:00 GMT

Overview

A supervised training principle for fine-tuning LLMs on instruction-response data with cross-entropy loss and distributed gradient computation.

Description

The Supervised Training Loop iterates over batched instruction-response data, computing cross-entropy loss on response tokens only (prompt tokens are masked). The training step handles gradient computation, accumulation across micro-batches, optimizer stepping, and learning rate scheduling through the configured distributed training strategy.

Usage

Use as the core training loop for SFT pipelines. Each step processes a batch through forward-backward pass and parameter update.

Theoretical Basis

The SFT objective minimizes next-token prediction loss: $L = - \frac{1}{| R |} \sum_{t \in R} \log P_{θ} (y_{t} | y_{< t}, x)$

Where R is the set of response token positions (non-masked).

Related Pages

Implemented By

Implementation:Alibaba_ROLL_SFTWorker_Train_Step

Related Heuristics

The following heuristics inform this principle:

Heuristic:Alibaba_ROLL_Gradient_Checkpointing_Recomputation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment