Implementation:Alibaba ROLL WanTrainingModule Forward

Knowledge Sources	Alibaba ROLL
Domains	Diffusion_Models, Reinforcement_Learning
Last Updated	2026-02-07 20:00 GMT

Overview

Concrete video generation and reward scoring forward pass from the WanTrainingModule provided by the Alibaba ROLL library.

Description

The WanTrainingModule.forward method performs multi-step Euler denoising (frozen then gradient-enabled), VAE decoding, face detection and embedding extraction, cosine similarity reward scoring, and KL divergence regularization in a single differentiable pass.

Usage

Called by the ActorWorker's loss_func during each training step.

Code Reference

Source Location

Repository: Alibaba ROLL
File: roll/pipeline/diffusion/modules/wan_module.py
Lines: L232-296

Signature

def forward(
    self,
    data: dict,
    inputs: Optional[dict] = None
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """
    Forward pass with reward computation.

    Args:
        data: dict with "prompt": List[str] and "video": torch.Tensor (B,T,C,H,W)
        inputs: Optional preprocessed inputs

    Returns:
        Tuple of:
        - loss: Total reward-weighted loss
        - face_score: Face identity cosine similarity score
        - kl_loss: KL divergence between LoRA-on and LoRA-off predictions
    """

Import

from roll.pipeline.diffusion.modules.wan_module import WanTrainingModule

I/O Contract

Inputs

Name	Type	Required	Description
data	dict	Yes	Contains "prompt" (List[str]) and "video" (torch.Tensor B,T,C,H,W)

Outputs

Name	Type	Description
loss	torch.Tensor	Total training loss
face_score	torch.Tensor	Face identity similarity (cosine)
kl_loss	torch.Tensor	KL divergence regularization

Usage Examples

loss, face_score, kl_loss = module.forward({
    "prompt": ["a person talking"],
    "video": video_tensor  # (1, 16, 3, 256, 256)
})

Related Pages

Implements Principle

Principle:Alibaba_ROLL_Video_Generation_and_Reward

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

This implementation uses the following heuristics:

Heuristic:Alibaba_ROLL_Reward_Clipping_Normalization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment