Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL SFTWorker Train Step

From Leeroopedia


Knowledge Sources
Domains Supervised_Learning, Distributed_Training
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete SFT training step and loss function provided by the Alibaba ROLL library.

Description

The SFTWorker.train_step method dispatches a training step through the configured strategy (Megatron/DeepSpeed/FSDP2). The loss_func computes cross-entropy loss on response tokens using the labels tensor with prompt positions masked as -100.

Usage

Called by the SFT pipeline for each training batch.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/sft/sft_worker.py
  • Lines: L31-73

Signature

class SFTWorker(Worker):
    @register(Dispatch.DP_MP_DISPATCH_FIRST, clear_cache=False)
    def train_step(self, data: DataProto) -> DataProto:
        """
        Single SFT training step.

        Args:
            data: DataProto with input_ids, attention_mask, position_ids, labels

        Returns:
            DataProto with metrics (sft_train/loss@sum, learning_rate)
        """

    def loss_func(
        self,
        data: DataProto,
        output_tensor: torch.Tensor
    ) -> Tuple[torch.Tensor, Dict]:
        """
        Compute SFT cross-entropy loss on response tokens.

        Args:
            data: DataProto with labels
            output_tensor: Model logits

        Returns:
            (loss, metrics_dict)
        """

Import

from roll.pipeline.sft.sft_worker import SFTWorker

I/O Contract

Inputs

Name Type Required Description
data DataProto Yes Batch with input_ids, attention_mask, labels (masked)

Outputs

Name Type Description
metrics Dict sft_train/loss@sum, learning_rate

Usage Examples

# Called via cluster dispatch in the SFT pipeline:
results = sft_train.execute_all_sync("train_step", batch)

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

This implementation uses the following heuristics:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment