Implementation:Allenai Open instruct DPO ExperimentConfig

Component Type	Dataclass
Source	`open_instruct/dpo_utils.py` (Lines 307-464)
Repository	Open Instruct
Dependencies	dataclasses, json, transformers
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for defining and validating the complete set of hyperparameters for a DPO training experiment, provided by the Open Instruct library.

Description

ExperimentConfig is a Python dataclass that composes multiple focused configuration dataclasses via multiple inheritance, producing a single unified configuration object for DPO training. It inherits from:

TrackingConfig -- Experiment name, seed, run name.
ModelConfig -- Model path, attention backend, revision.
DPOConfig -- Beta, loss type, gamma/beta ratio, label smoothing, concatenated forward, packing.
TrainingConfig -- Epochs, batch size, gradient accumulation, learning rate, scheduler, gradient clipping, optimizer settings.
DatasetConfig -- Mixer list, transforms, caching mode.
LoRAConfig -- LoRA rank, alpha, dropout.
LoggingConfig -- Logging steps, W&B project/entity.
HubConfig -- HF Hub push settings, repo ID, revision.
CheckpointConfig -- Output directory, checkpoint steps, retention.
EvalConfig -- Beaker evaluation tasks, workspace, priority.

In addition to inherited fields, ExperimentConfig adds experiment-level fields:

exp_name: Base experiment name (defaults to the script filename).
dataset_name, dataset_mixer, dataset_mix_dir: Legacy dataset specification options.
max_seq_length, max_train_samples: Data truncation controls.
use_qlora: Quantized LoRA support.
zero_stage, offload_optimizer, offload_param: DeepSpeed ZeRO configuration.
save_to_hub, gs_bucket_path: Model export destinations.
oe_eval_tasks, oe_eval_max_length: Evaluation settings.

The forward_fn property dynamically selects the appropriate forward function (concatenated or separate, with or without packing).

The __post_init__ method performs validation:

Converts string loss_type to DPOLossType enum.
Ensures exactly one dataset source is provided.
Validates that Beaker evaluation requires Hub pushing.
Parses JSON-encoded dictionary fields.
Validates DeepSpeed ZeRO stage constraints.

Usage

Import ExperimentConfig when you need to define, parse, or validate a DPO training configuration. It is typically instantiated from command-line arguments via ArgumentParserPlus.

Code Reference

Source Location

Repository: Open Instruct
File: open_instruct/dpo_utils.py (Lines 307-464)

Signature

@dataclass
class ExperimentConfig(
    TrackingConfig,
    ModelConfig,
    DPOConfig,
    TrainingConfig,
    DatasetConfig,
    LoRAConfig,
    LoggingConfig,
    HubConfig,
    CheckpointConfig,
    EvalConfig,
):
    """Full arguments class for all fine-tuning jobs."""

    exp_name: str = os.path.basename(__file__)[: -len(".py")]
    dataset_name: str | None = None
    dataset_mixer: dict | None = None
    max_seq_length: int | None = None
    max_train_samples: int | None = None
    use_qlora: bool = False
    zero_stage: int | None = None
    offload_optimizer: bool = False
    offload_param: bool = False
    ...

    @property
    def forward_fn(self) -> Callable: ...
    def __post_init__(self): ...

Import

from open_instruct.dpo_utils import ExperimentConfig

I/O Contract

Inherited Configuration Groups

Config Group	Key Fields	Defaults
`TrackingConfig`	`exp_name`, `seed`, `run_name`	`"dpo_experiment"`, `42`, `None`
`DPOConfig`	`beta`, `loss_type`, `gamma_beta_ratio`, `label_smoothing`	`0.1`, `"dpo"`, `0.3`, `0.0`
`TrainingConfig`	`num_epochs`, `per_device_train_batch_size`, `gradient_accumulation_steps`, `learning_rate`, `max_grad_norm`	`2`, `8`, `1`, `2e-5`, `-1`
`DatasetConfig`	`mixer_list`, `transform_fn`, `cache_mode`	Tulu-3 defaults, preference transforms, `"local"`
`LoRAConfig`	`use_lora`, `lora_rank`, `lora_alpha`, `lora_dropout`	`False`, `64`, `16`, `0.1`
`LoggingConfig`	`logging_steps`, `with_tracking`, `wandb_project`	`None`, `False`, `"open_instruct_internal"`
`HubConfig`	`push_to_hub`, `hf_entity`, `hf_repo_id`	`True`, `None`, `None`
`CheckpointConfig`	`output_dir`, `checkpointing_steps`, `keep_last_n_checkpoints`	`"output/"`, `500`, `3`
`EvalConfig`	`try_launch_beaker_eval_jobs`, `oe_eval_tasks`	`True`, `None`
`ModelConfig`	`model_name_or_path`, `use_flash_attn`, `model_revision`	`None`, `True`, `None`

Computed Properties

Property	Type	Description
`forward_fn`	`Callable`	Returns the appropriate forward function based on `concatenated_forward` and `packing` settings.

Usage Examples

from open_instruct.dpo_utils import ExperimentConfig

# Create configuration with custom settings
config = ExperimentConfig(
    model_name_or_path="allenai/tulu-2-7b",
    loss_type="dpo_norm",
    beta=0.1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    learning_rate=5e-7,
    num_epochs=1,
    max_grad_norm=1.0,
    use_lora=True,
    lora_rank=16,
    with_tracking=True,
    wandb_project="my_dpo_experiments",
)

# Access inherited fields
print(config.beta)           # 0.1 (from DPOConfig)
print(config.learning_rate)  # 5e-7 (from TrainingConfig)
print(config.use_lora)       # True (from LoRAConfig)

# Access computed property
fwd_fn = config.forward_fn   # Returns concatenated_forward or separate_forward

Alternatively, parse from command line:

from open_instruct.utils import ArgumentParserPlus
from open_instruct.dpo_utils import ExperimentConfig

parser = ArgumentParserPlus((ExperimentConfig,))
args = parser.parse()

Related Pages

Implements Principle

Principle:Allenai_Open_instruct_DPO_Experiment_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment