Implementation:Alibaba ROLL RewardFLConfig
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Diffusion_Models, Configuration |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete reward flow configuration dataclass for video diffusion training provided by the Alibaba ROLL library.
Description
The RewardFLConfig class extends BaseConfig with settings for diffusion model training including batch size, gradient norm, and actor_train WorkerConfig with strategy_name="diffusion_deepspeed_train".
Usage
Loaded from YAML via Hydra for reward flow pipelines.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/diffusion/reward_fl/reward_fl_config.py
- Lines: L12-48
Signature
@dataclass
class RewardFLConfig(BaseConfig):
"""
Configuration for reward flow diffusion training.
Attributes:
train_batch_size: int = 8 - batch size
max_grad_norm: float = 1.0 - gradient clipping
actor_train: WorkerConfig - worker config with diffusion_deepspeed_train strategy
"""
Import
from roll.pipeline.diffusion.reward_fl.reward_fl_config import RewardFLConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| YAML config file | str | Yes | Hydra-managed YAML with diffusion model paths |
Outputs
| Name | Type | Description |
|---|---|---|
| RewardFLConfig | RewardFLConfig | Config with actor_train WorkerConfig |
Usage Examples
from hydra import compose, initialize
import dacite
from omegaconf import OmegaConf
initialize(config_path="examples/wan2.2-14B-reward_fl_ds")
cfg = compose(config_name="reward_fl_config")
config = dacite.from_dict(data_class=RewardFLConfig, data=OmegaConf.to_container(cfg, resolve=True))
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
- Environment:Alibaba_ROLL_Python_Runtime_Environment
- Environment:Alibaba_ROLL_Diffusion_Video_Environment
Heuristics Applied
No specific heuristics apply to this implementation.
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment