Implementation:Alibaba ROLL SFTWorker Train Step

Knowledge Sources	Alibaba ROLL
Domains	Supervised_Learning, Distributed_Training
Last Updated	2026-02-07 20:00 GMT

Overview

Concrete SFT training step and loss function provided by the Alibaba ROLL library.

Description

The SFTWorker.train_step method dispatches a training step through the configured strategy (Megatron/DeepSpeed/FSDP2). The loss_func computes cross-entropy loss on response tokens using the labels tensor with prompt positions masked as -100.

Usage

Called by the SFT pipeline for each training batch.

Code Reference

Source Location

Repository: Alibaba ROLL
File: roll/pipeline/sft/sft_worker.py
Lines: L31-73

Signature

class SFTWorker(Worker):
    @register(Dispatch.DP_MP_DISPATCH_FIRST, clear_cache=False)
    def train_step(self, data: DataProto) -> DataProto:
        """
        Single SFT training step.

        Args:
            data: DataProto with input_ids, attention_mask, position_ids, labels

        Returns:
            DataProto with metrics (sft_train/loss@sum, learning_rate)
        """

    def loss_func(
        self,
        data: DataProto,
        output_tensor: torch.Tensor
    ) -> Tuple[torch.Tensor, Dict]:
        """
        Compute SFT cross-entropy loss on response tokens.

        Args:
            data: DataProto with labels
            output_tensor: Model logits

        Returns:
            (loss, metrics_dict)
        """

Import

from roll.pipeline.sft.sft_worker import SFTWorker

I/O Contract

Inputs

Name	Type	Required	Description
data	DataProto	Yes	Batch with input_ids, attention_mask, labels (masked)

Outputs

Name	Type	Description
metrics	Dict	sft_train/loss@sum, learning_rate

Usage Examples

# Called via cluster dispatch in the SFT pipeline:
results = sft_train.execute_all_sync("train_step", batch)

Related Pages

Implements Principle

Principle:Alibaba_ROLL_Supervised_Training_Loop

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Environment:Alibaba_ROLL_CUDA_GPU_Environment

Heuristics Applied

This implementation uses the following heuristics:

Heuristic:Alibaba_ROLL_Gradient_Checkpointing_Recomputation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment