Environment:Alibaba ROLL Diffusion Video Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Video_Generation, Diffusion_Models |
| Last Updated | 2026-02-07 19:00 GMT |
Overview
Diffusion model training environment with DiffSynth, Diffusers, ONNX, and video processing dependencies for Wan2.2 video generation with Reward Flow Learning.
Description
This environment provides the dependencies needed for the Reward Flow Diffusion pipeline, which trains diffusion models (specifically Wan2.2 video generation) against reward scorers using reinforcement learning. It adds video processing libraries (decord), diffusion frameworks (DiffSynth, Diffusers), ONNX model support, and image processing tools on top of the common ROLL dependencies. The pipeline uses DeepSpeed as its training backend with LoRA parameter-efficient fine-tuning.
Usage
Use this environment when running the Reward Flow Diffusion pipeline for video generation model optimization. This is a specialized environment layered on top of the CUDA GPU and DeepSpeed environments.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | NVIDIA GPU with CUDA | High VRAM recommended for video generation |
| VRAM | 40GB+ per GPU | Video generation is memory-intensive |
| Disk | 100GB+ SSD | For video data and model checkpoints |
Dependencies
Python Packages
- `torch` == 2.6.0
- `deepspeed` == 0.16.4
- `flash-attn`
- `diffsynth`
- `diffusers` == 0.31.0
- `transformers` == 4.52.4
- `decord` (video decoding)
- `pyext`
- `pycocotools`
- `scikit-image`
- `onnx`
- `onnx2torch`
Credentials
No additional credentials beyond the base CUDA and DeepSpeed environments.
Quick Install
pip install -r requirements_torch260_diffsynth.txt
Code Evidence
Requirements file `requirements_torch260_diffsynth.txt:1-25`:
-r requirements_common.txt
torch==2.6.0.*
deepspeed==0.16.4
flash-attn
diffsynth
transformers==4.52.4
decord
diffusers==0.31.0
onnx
onnx2torch
Log variance clipping for numerical stability from `roll/pipeline/diffusion/reward_fl/wan_video_vae.py:1080`:
std = torch.exp(0.5 * log_var.clamp(-30.0, 20.0))
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'diffsynth'` | DiffSynth not installed | `pip install diffsynth` |
| `ModuleNotFoundError: No module named 'decord'` | Video decoder not installed | `pip install decord` |
| CUDA OOM during video generation | Insufficient VRAM | Use LoRA with lower rank, reduce video resolution |
Compatibility Notes
- NVIDIA only: Video generation not tested on AMD ROCm or Ascend NPU.
- LoRA required: Full fine-tuning not supported; LoRA is mandatory for memory efficiency.
- Transformers version: Pinned to 4.52.4 (different from other ROLL pipelines).