Implementation:Alibaba ROLL DPOConfig
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Alignment, Configuration |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete DPO configuration dataclass provided by the Alibaba ROLL library.
Description
The DPOConfig class extends BaseConfig with DPO-specific parameters for preference alignment training.
Usage
Loaded from YAML via Hydra for DPO training pipelines.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/dpo/dpo_config.py
- Lines: L12-93
Signature
@dataclass
class DPOConfig(BaseConfig):
"""
Configuration for DPO training.
Attributes:
beta: float = 0.1 - DPO temperature parameter
ipo: bool = False - Use IPO loss variant
label_smoothing: float = 0.0 - cDPO label smoothing
chosen_key: str = "chosen" - Dataset key for chosen responses
rejected_key: str = "rejected" - Dataset key for rejected responses
actor_train: WorkerConfig - Trainable policy worker config
reference: WorkerConfig - Frozen reference model worker config
"""
Import
from roll.pipeline.dpo.dpo_config import DPOConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| YAML config file | str | Yes | Hydra-managed YAML configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| DPOConfig instance | DPOConfig | Validated config with actor_train and reference WorkerConfigs |
Usage Examples
from hydra import compose, initialize
import dacite
from omegaconf import OmegaConf
initialize(config_path="examples/qwen2.5-3B-dpo_megatron")
cfg = compose(config_name="dpo_config")
config = dacite.from_dict(data_class=DPOConfig, data=OmegaConf.to_container(cfg, resolve=True))
print(config.beta) # 0.1
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
No specific heuristics apply to this implementation.
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment