Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL RewardFL ActorWorker Train Step

From Leeroopedia


Knowledge Sources
Domains Diffusion_Models, Optimization
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete reward flow actor worker training step for diffusion model LoRA optimization provided by the Alibaba ROLL library.

Description

The ActorWorker.train_step and loss_func methods compute the reward flow loss combining face identity score and KL regularization, then dispatch the gradient update through the diffusion DeepSpeed strategy.

Usage

Called by the reward flow pipeline for each training batch.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/diffusion/reward_fl/actor_worker.py
  • Lines: L15-60

Signature

class ActorWorker(BaseActorWorker):
    @register(dispatch_mode=Dispatch.DP_MP_DISPATCH_FIRST, clear_cache=False)
    def train_step(self, data: DataProto) -> DataProto:
        """
        Training step for reward FL.

        Args:
            data: DataProto with video tensors and prompts

        Returns:
            DataProto with metrics (actor/loss, actor/face_score, actor/kl_loss)
        """

    def loss_func(self, data, loss, face_score, kl_loss) -> Tuple[torch.Tensor, dict]:
        """
        Compute reward FL loss.

        Loss formula: -(face_score - 0.54) / 0.16 * 0.1 + kl_loss
        """

Import

from roll.pipeline.diffusion.reward_fl.actor_worker import ActorWorker

I/O Contract

Inputs

Name Type Required Description
data DataProto Yes Batch with video tensors and prompt strings

Outputs

Name Type Description
metrics Dict actor/loss, actor/face_score, actor/kl_loss

Usage Examples

results = actor_train.execute_all_sync("train_step", batch)

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

This implementation uses the following heuristics:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment