Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Alibaba ROLL Diffusion Video Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Video_Generation, Diffusion_Models
Last Updated 2026-02-07 19:00 GMT

Overview

Diffusion model training environment with DiffSynth, Diffusers, ONNX, and video processing dependencies for Wan2.2 video generation with Reward Flow Learning.

Description

This environment provides the dependencies needed for the Reward Flow Diffusion pipeline, which trains diffusion models (specifically Wan2.2 video generation) against reward scorers using reinforcement learning. It adds video processing libraries (decord), diffusion frameworks (DiffSynth, Diffusers), ONNX model support, and image processing tools on top of the common ROLL dependencies. The pipeline uses DeepSpeed as its training backend with LoRA parameter-efficient fine-tuning.

Usage

Use this environment when running the Reward Flow Diffusion pipeline for video generation model optimization. This is a specialized environment layered on top of the CUDA GPU and DeepSpeed environments.

System Requirements

Category Requirement Notes
Hardware NVIDIA GPU with CUDA High VRAM recommended for video generation
VRAM 40GB+ per GPU Video generation is memory-intensive
Disk 100GB+ SSD For video data and model checkpoints

Dependencies

Python Packages

  • `torch` == 2.6.0
  • `deepspeed` == 0.16.4
  • `flash-attn`
  • `diffsynth`
  • `diffusers` == 0.31.0
  • `transformers` == 4.52.4
  • `decord` (video decoding)
  • `pyext`
  • `pycocotools`
  • `scikit-image`
  • `onnx`
  • `onnx2torch`

Credentials

No additional credentials beyond the base CUDA and DeepSpeed environments.

Quick Install

pip install -r requirements_torch260_diffsynth.txt

Code Evidence

Requirements file `requirements_torch260_diffsynth.txt:1-25`:

-r requirements_common.txt
torch==2.6.0.*
deepspeed==0.16.4
flash-attn
diffsynth
transformers==4.52.4
decord
diffusers==0.31.0
onnx
onnx2torch

Log variance clipping for numerical stability from `roll/pipeline/diffusion/reward_fl/wan_video_vae.py:1080`:

std = torch.exp(0.5 * log_var.clamp(-30.0, 20.0))

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'diffsynth'` DiffSynth not installed `pip install diffsynth`
`ModuleNotFoundError: No module named 'decord'` Video decoder not installed `pip install decord`
CUDA OOM during video generation Insufficient VRAM Use LoRA with lower rank, reduce video resolution

Compatibility Notes

  • NVIDIA only: Video generation not tested on AMD ROCm or Ascend NPU.
  • LoRA required: Full fine-tuning not supported; LoRA is mandatory for memory efficiency.
  • Transformers version: Pinned to 4.52.4 (different from other ROLL pipelines).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment