Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL DPOConfig

From Leeroopedia


Knowledge Sources
Domains Alignment, Configuration
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete DPO configuration dataclass provided by the Alibaba ROLL library.

Description

The DPOConfig class extends BaseConfig with DPO-specific parameters for preference alignment training.

Usage

Loaded from YAML via Hydra for DPO training pipelines.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/dpo/dpo_config.py
  • Lines: L12-93

Signature

@dataclass
class DPOConfig(BaseConfig):
    """
    Configuration for DPO training.

    Attributes:
        beta: float = 0.1 - DPO temperature parameter
        ipo: bool = False - Use IPO loss variant
        label_smoothing: float = 0.0 - cDPO label smoothing
        chosen_key: str = "chosen" - Dataset key for chosen responses
        rejected_key: str = "rejected" - Dataset key for rejected responses
        actor_train: WorkerConfig - Trainable policy worker config
        reference: WorkerConfig - Frozen reference model worker config
    """

Import

from roll.pipeline.dpo.dpo_config import DPOConfig

I/O Contract

Inputs

Name Type Required Description
YAML config file str Yes Hydra-managed YAML configuration

Outputs

Name Type Description
DPOConfig instance DPOConfig Validated config with actor_train and reference WorkerConfigs

Usage Examples

from hydra import compose, initialize
import dacite
from omegaconf import OmegaConf

initialize(config_path="examples/qwen2.5-3B-dpo_megatron")
cfg = compose(config_name="dpo_config")
config = dacite.from_dict(data_class=DPOConfig, data=OmegaConf.to_container(cfg, resolve=True))
print(config.beta)  # 0.1

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

No specific heuristics apply to this implementation.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment