Environment:Alibaba ROLL Diffusion Video Environment

Knowledge Sources	Alibaba ROLL
Domains	Infrastructure, Video_Generation, Diffusion_Models
Last Updated	2026-02-07 19:00 GMT

Overview

Diffusion model training environment with DiffSynth, Diffusers, ONNX, and video processing dependencies for Wan2.2 video generation with Reward Flow Learning.

Description

This environment provides the dependencies needed for the Reward Flow Diffusion pipeline, which trains diffusion models (specifically Wan2.2 video generation) against reward scorers using reinforcement learning. It adds video processing libraries (decord), diffusion frameworks (DiffSynth, Diffusers), ONNX model support, and image processing tools on top of the common ROLL dependencies. The pipeline uses DeepSpeed as its training backend with LoRA parameter-efficient fine-tuning.

Usage

Use this environment when running the Reward Flow Diffusion pipeline for video generation model optimization. This is a specialized environment layered on top of the CUDA GPU and DeepSpeed environments.

System Requirements

Category	Requirement	Notes
Hardware	NVIDIA GPU with CUDA	High VRAM recommended for video generation
VRAM	40GB+ per GPU	Video generation is memory-intensive
Disk	100GB+ SSD	For video data and model checkpoints

Dependencies

Python Packages

`torch` == 2.6.0
`deepspeed` == 0.16.4
`flash-attn`
`diffsynth`
`diffusers` == 0.31.0
`transformers` == 4.52.4
`decord` (video decoding)
`pyext`
`pycocotools`
`scikit-image`
`onnx`
`onnx2torch`

Credentials

No additional credentials beyond the base CUDA and DeepSpeed environments.

Quick Install

pip install -r requirements_torch260_diffsynth.txt

Code Evidence

Requirements file `requirements_torch260_diffsynth.txt:1-25`:

-r requirements_common.txt
torch==2.6.0.*
deepspeed==0.16.4
flash-attn
diffsynth
transformers==4.52.4
decord
diffusers==0.31.0
onnx
onnx2torch

Log variance clipping for numerical stability from `roll/pipeline/diffusion/reward_fl/wan_video_vae.py:1080`:

std = torch.exp(0.5 * log_var.clamp(-30.0, 20.0))

Common Errors

Error Message	Cause	Solution
`ModuleNotFoundError: No module named 'diffsynth'`	DiffSynth not installed	`pip install diffsynth`
`ModuleNotFoundError: No module named 'decord'`	Video decoder not installed	`pip install decord`
CUDA OOM during video generation	Insufficient VRAM	Use LoRA with lower rank, reduce video resolution

Compatibility Notes

NVIDIA only: Video generation not tested on AMD ROCm or Ascend NPU.
LoRA required: Full fine-tuning not supported; LoRA is mandatory for memory efficiency.
Transformers version: Pinned to 4.52.4 (different from other ROLL pipelines).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment