Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL DPO Cluster Setup

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Alignment
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete two-cluster initialization for DPO training using the Cluster class from the Alibaba ROLL library.

Description

DPO pipeline initialization creates two Cluster instances: actor_train (with the DPO ActorWorker) and reference (with inference-only strategy). The reference cluster computes log probabilities once per batch, which are cached for the training loss computation.

Usage

Called during DPOPipeline.__init__ to set up distributed workers.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/dpo/dpo_pipeline.py
  • Lines: L152-176

Signature

# Within DPOPipeline.__init__:
actor_train = Cluster(
    name="actor_train",
    worker_cls="roll.pipeline.dpo.actor_worker.ActorWorker",
    resource_manager=resource_manager,
    worker_config=config.actor_train,
)
reference = Cluster(
    name="reference",
    worker_cls="roll.pipeline.dpo.actor_worker.ActorWorker",
    resource_manager=resource_manager,
    worker_config=config.reference,
)

Import

from roll.distributed.executor.cluster import Cluster
from roll.pipeline.dpo.dpo_pipeline import DPOPipeline

I/O Contract

Inputs

Name Type Required Description
config DPOConfig Yes DPO configuration with worker configs
resource_manager ResourceManager Yes GPU resource manager

Outputs

Name Type Description
actor_train Cluster Trainable policy cluster
reference Cluster Frozen reference model cluster

Usage Examples

pipeline = DPOPipeline(pipeline_config=dpo_config)
# actor_train and reference clusters are initialized automatically

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

This implementation uses the following heuristics:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment