Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL SFT Cluster Setup

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Supervised_Learning
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete single-cluster initialization for SFT training using the Alibaba ROLL library.

Description

The SFT pipeline creates a single sft_train Cluster with SFTWorker instances configured for the specified training strategy.

Usage

Called during SFTPipeline.__init__.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/sft/sft_pipeline.py
  • Lines: L136-142

Signature

# Within SFTPipeline.__init__:
sft_train = Cluster(
    name="sft_train",
    worker_cls="roll.pipeline.sft.sft_worker.SFTWorker",
    resource_manager=resource_manager,
    worker_config=config.sft_train,
)

Import

from roll.distributed.executor.cluster import Cluster
from roll.pipeline.sft.sft_pipeline import SFTPipeline

I/O Contract

Inputs

Name Type Required Description
config SFTConfig Yes SFT configuration with sft_train WorkerConfig

Outputs

Name Type Description
sft_train Cluster Single training cluster with SFTWorker instances

Usage Examples

pipeline = SFTPipeline(pipeline_config=sft_config)
# sft_train cluster is initialized automatically

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

This implementation uses the following heuristics:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment