Implementation:Alibaba ROLL DPO Cluster Setup
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Alignment |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete two-cluster initialization for DPO training using the Cluster class from the Alibaba ROLL library.
Description
DPO pipeline initialization creates two Cluster instances: actor_train (with the DPO ActorWorker) and reference (with inference-only strategy). The reference cluster computes log probabilities once per batch, which are cached for the training loss computation.
Usage
Called during DPOPipeline.__init__ to set up distributed workers.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/dpo/dpo_pipeline.py
- Lines: L152-176
Signature
# Within DPOPipeline.__init__:
actor_train = Cluster(
name="actor_train",
worker_cls="roll.pipeline.dpo.actor_worker.ActorWorker",
resource_manager=resource_manager,
worker_config=config.actor_train,
)
reference = Cluster(
name="reference",
worker_cls="roll.pipeline.dpo.actor_worker.ActorWorker",
resource_manager=resource_manager,
worker_config=config.reference,
)
Import
from roll.distributed.executor.cluster import Cluster
from roll.pipeline.dpo.dpo_pipeline import DPOPipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | DPOConfig | Yes | DPO configuration with worker configs |
| resource_manager | ResourceManager | Yes | GPU resource manager |
Outputs
| Name | Type | Description |
|---|---|---|
| actor_train | Cluster | Trainable policy cluster |
| reference | Cluster | Frozen reference model cluster |
Usage Examples
pipeline = DPOPipeline(pipeline_config=dpo_config)
# actor_train and reference clusters are initialized automatically
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
This implementation uses the following heuristics:
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment