Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct DPO ExperimentConfig

From Leeroopedia


Component Type Dataclass
Source open_instruct/dpo_utils.py (Lines 307-464)
Repository Open Instruct
Dependencies dataclasses, json, transformers
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for defining and validating the complete set of hyperparameters for a DPO training experiment, provided by the Open Instruct library.

Description

ExperimentConfig is a Python dataclass that composes multiple focused configuration dataclasses via multiple inheritance, producing a single unified configuration object for DPO training. It inherits from:

  • TrackingConfig -- Experiment name, seed, run name.
  • ModelConfig -- Model path, attention backend, revision.
  • DPOConfig -- Beta, loss type, gamma/beta ratio, label smoothing, concatenated forward, packing.
  • TrainingConfig -- Epochs, batch size, gradient accumulation, learning rate, scheduler, gradient clipping, optimizer settings.
  • DatasetConfig -- Mixer list, transforms, caching mode.
  • LoRAConfig -- LoRA rank, alpha, dropout.
  • LoggingConfig -- Logging steps, W&B project/entity.
  • HubConfig -- HF Hub push settings, repo ID, revision.
  • CheckpointConfig -- Output directory, checkpoint steps, retention.
  • EvalConfig -- Beaker evaluation tasks, workspace, priority.

In addition to inherited fields, ExperimentConfig adds experiment-level fields:

  • exp_name: Base experiment name (defaults to the script filename).
  • dataset_name, dataset_mixer, dataset_mix_dir: Legacy dataset specification options.
  • max_seq_length, max_train_samples: Data truncation controls.
  • use_qlora: Quantized LoRA support.
  • zero_stage, offload_optimizer, offload_param: DeepSpeed ZeRO configuration.
  • save_to_hub, gs_bucket_path: Model export destinations.
  • oe_eval_tasks, oe_eval_max_length: Evaluation settings.

The forward_fn property dynamically selects the appropriate forward function (concatenated or separate, with or without packing).

The __post_init__ method performs validation:

  • Converts string loss_type to DPOLossType enum.
  • Ensures exactly one dataset source is provided.
  • Validates that Beaker evaluation requires Hub pushing.
  • Parses JSON-encoded dictionary fields.
  • Validates DeepSpeed ZeRO stage constraints.

Usage

Import ExperimentConfig when you need to define, parse, or validate a DPO training configuration. It is typically instantiated from command-line arguments via ArgumentParserPlus.

Code Reference

Source Location

  • Repository: Open Instruct
  • File: open_instruct/dpo_utils.py (Lines 307-464)

Signature

@dataclass
class ExperimentConfig(
    TrackingConfig,
    ModelConfig,
    DPOConfig,
    TrainingConfig,
    DatasetConfig,
    LoRAConfig,
    LoggingConfig,
    HubConfig,
    CheckpointConfig,
    EvalConfig,
):
    """Full arguments class for all fine-tuning jobs."""

    exp_name: str = os.path.basename(__file__)[: -len(".py")]
    dataset_name: str | None = None
    dataset_mixer: dict | None = None
    max_seq_length: int | None = None
    max_train_samples: int | None = None
    use_qlora: bool = False
    zero_stage: int | None = None
    offload_optimizer: bool = False
    offload_param: bool = False
    ...

    @property
    def forward_fn(self) -> Callable: ...
    def __post_init__(self): ...

Import

from open_instruct.dpo_utils import ExperimentConfig

I/O Contract

Inherited Configuration Groups

Config Group Key Fields Defaults
TrackingConfig exp_name, seed, run_name "dpo_experiment", 42, None
DPOConfig beta, loss_type, gamma_beta_ratio, label_smoothing 0.1, "dpo", 0.3, 0.0
TrainingConfig num_epochs, per_device_train_batch_size, gradient_accumulation_steps, learning_rate, max_grad_norm 2, 8, 1, 2e-5, -1
DatasetConfig mixer_list, transform_fn, cache_mode Tulu-3 defaults, preference transforms, "local"
LoRAConfig use_lora, lora_rank, lora_alpha, lora_dropout False, 64, 16, 0.1
LoggingConfig logging_steps, with_tracking, wandb_project None, False, "open_instruct_internal"
HubConfig push_to_hub, hf_entity, hf_repo_id True, None, None
CheckpointConfig output_dir, checkpointing_steps, keep_last_n_checkpoints "output/", 500, 3
EvalConfig try_launch_beaker_eval_jobs, oe_eval_tasks True, None
ModelConfig model_name_or_path, use_flash_attn, model_revision None, True, None

Computed Properties

Property Type Description
forward_fn Callable Returns the appropriate forward function based on concatenated_forward and packing settings.

Usage Examples

from open_instruct.dpo_utils import ExperimentConfig

# Create configuration with custom settings
config = ExperimentConfig(
    model_name_or_path="allenai/tulu-2-7b",
    loss_type="dpo_norm",
    beta=0.1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    learning_rate=5e-7,
    num_epochs=1,
    max_grad_norm=1.0,
    use_lora=True,
    lora_rank=16,
    with_tracking=True,
    wandb_project="my_dpo_experiments",
)

# Access inherited fields
print(config.beta)           # 0.1 (from DPOConfig)
print(config.learning_rate)  # 5e-7 (from TrainingConfig)
print(config.use_lora)       # True (from LoRAConfig)

# Access computed property
fwd_fn = config.forward_fn   # Returns concatenated_forward or separate_forward

Alternatively, parse from command line:

from open_instruct.utils import ArgumentParserPlus
from open_instruct.dpo_utils import ExperimentConfig

parser = ArgumentParserPlus((ExperimentConfig,))
args = parser.parse()

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment