Principle:Alibaba ROLL Diffusion Worker Initialization
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Diffusion_Models |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A distributed initialization principle for deploying diffusion model training workers with DeepSpeed-based strategy.
Description
Diffusion Worker Initialization creates a single actor_train cluster with a diffusion-specific DeepSpeed training strategy. Unlike LLM training which may use Megatron-Core, diffusion models use a simplified DeepSpeed wrapper that handles the WanTrainingModule's unique forward/loss interface.
Usage
Use during reward flow pipeline initialization.
Theoretical Basis
Diffusion training requires specialized handling due to the multi-step denoising loop within each training step.
Related Pages
Implemented By
Related Heuristics
The following heuristics inform this principle: