Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba ROLL Supervised Training Loop

From Leeroopedia


Knowledge Sources
Domains Supervised_Learning, Distributed_Training
Last Updated 2026-02-07 20:00 GMT

Overview

A supervised training principle for fine-tuning LLMs on instruction-response data with cross-entropy loss and distributed gradient computation.

Description

The Supervised Training Loop iterates over batched instruction-response data, computing cross-entropy loss on response tokens only (prompt tokens are masked). The training step handles gradient computation, accumulation across micro-batches, optimizer stepping, and learning rate scheduling through the configured distributed training strategy.

Usage

Use as the core training loop for SFT pipelines. Each step processes a batch through forward-backward pass and parameter update.

Theoretical Basis

The SFT objective minimizes next-token prediction loss: L=1|R|tRlogPθ(yt|y<t,x)

Where R is the set of response token positions (non-masked).

Related Pages

Implemented By

Related Heuristics

The following heuristics inform this principle:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment