Implementation:Allenai Open instruct DPO ExperimentConfig
| Component Type | Dataclass |
|---|---|
| Source | open_instruct/dpo_utils.py (Lines 307-464)
|
| Repository | Open Instruct |
| Dependencies | dataclasses, json, transformers |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for defining and validating the complete set of hyperparameters for a DPO training experiment, provided by the Open Instruct library.
Description
ExperimentConfig is a Python dataclass that composes multiple focused configuration dataclasses via multiple inheritance, producing a single unified configuration object for DPO training. It inherits from:
TrackingConfig-- Experiment name, seed, run name.ModelConfig-- Model path, attention backend, revision.DPOConfig-- Beta, loss type, gamma/beta ratio, label smoothing, concatenated forward, packing.TrainingConfig-- Epochs, batch size, gradient accumulation, learning rate, scheduler, gradient clipping, optimizer settings.DatasetConfig-- Mixer list, transforms, caching mode.LoRAConfig-- LoRA rank, alpha, dropout.LoggingConfig-- Logging steps, W&B project/entity.HubConfig-- HF Hub push settings, repo ID, revision.CheckpointConfig-- Output directory, checkpoint steps, retention.EvalConfig-- Beaker evaluation tasks, workspace, priority.
In addition to inherited fields, ExperimentConfig adds experiment-level fields:
exp_name: Base experiment name (defaults to the script filename).dataset_name,dataset_mixer,dataset_mix_dir: Legacy dataset specification options.max_seq_length,max_train_samples: Data truncation controls.use_qlora: Quantized LoRA support.zero_stage,offload_optimizer,offload_param: DeepSpeed ZeRO configuration.save_to_hub,gs_bucket_path: Model export destinations.oe_eval_tasks,oe_eval_max_length: Evaluation settings.
The forward_fn property dynamically selects the appropriate forward function (concatenated or separate, with or without packing).
The __post_init__ method performs validation:
- Converts string
loss_typetoDPOLossTypeenum. - Ensures exactly one dataset source is provided.
- Validates that Beaker evaluation requires Hub pushing.
- Parses JSON-encoded dictionary fields.
- Validates DeepSpeed ZeRO stage constraints.
Usage
Import ExperimentConfig when you need to define, parse, or validate a DPO training configuration. It is typically instantiated from command-line arguments via ArgumentParserPlus.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/dpo_utils.py(Lines 307-464)
Signature
@dataclass
class ExperimentConfig(
TrackingConfig,
ModelConfig,
DPOConfig,
TrainingConfig,
DatasetConfig,
LoRAConfig,
LoggingConfig,
HubConfig,
CheckpointConfig,
EvalConfig,
):
"""Full arguments class for all fine-tuning jobs."""
exp_name: str = os.path.basename(__file__)[: -len(".py")]
dataset_name: str | None = None
dataset_mixer: dict | None = None
max_seq_length: int | None = None
max_train_samples: int | None = None
use_qlora: bool = False
zero_stage: int | None = None
offload_optimizer: bool = False
offload_param: bool = False
...
@property
def forward_fn(self) -> Callable: ...
def __post_init__(self): ...
Import
from open_instruct.dpo_utils import ExperimentConfig
I/O Contract
Inherited Configuration Groups
| Config Group | Key Fields | Defaults |
|---|---|---|
TrackingConfig |
exp_name, seed, run_name |
"dpo_experiment", 42, None
|
DPOConfig |
beta, loss_type, gamma_beta_ratio, label_smoothing |
0.1, "dpo", 0.3, 0.0
|
TrainingConfig |
num_epochs, per_device_train_batch_size, gradient_accumulation_steps, learning_rate, max_grad_norm |
2, 8, 1, 2e-5, -1
|
DatasetConfig |
mixer_list, transform_fn, cache_mode |
Tulu-3 defaults, preference transforms, "local"
|
LoRAConfig |
use_lora, lora_rank, lora_alpha, lora_dropout |
False, 64, 16, 0.1
|
LoggingConfig |
logging_steps, with_tracking, wandb_project |
None, False, "open_instruct_internal"
|
HubConfig |
push_to_hub, hf_entity, hf_repo_id |
True, None, None
|
CheckpointConfig |
output_dir, checkpointing_steps, keep_last_n_checkpoints |
"output/", 500, 3
|
EvalConfig |
try_launch_beaker_eval_jobs, oe_eval_tasks |
True, None
|
ModelConfig |
model_name_or_path, use_flash_attn, model_revision |
None, True, None
|
Computed Properties
| Property | Type | Description |
|---|---|---|
forward_fn |
Callable |
Returns the appropriate forward function based on concatenated_forward and packing settings.
|
Usage Examples
from open_instruct.dpo_utils import ExperimentConfig
# Create configuration with custom settings
config = ExperimentConfig(
model_name_or_path="allenai/tulu-2-7b",
loss_type="dpo_norm",
beta=0.1,
per_device_train_batch_size=4,
gradient_accumulation_steps=8,
learning_rate=5e-7,
num_epochs=1,
max_grad_norm=1.0,
use_lora=True,
lora_rank=16,
with_tracking=True,
wandb_project="my_dpo_experiments",
)
# Access inherited fields
print(config.beta) # 0.1 (from DPOConfig)
print(config.learning_rate) # 5e-7 (from TrainingConfig)
print(config.use_lora) # True (from LoRAConfig)
# Access computed property
fwd_fn = config.forward_fn # Returns concatenated_forward or separate_forward
Alternatively, parse from command line:
from open_instruct.utils import ArgumentParserPlus
from open_instruct.dpo_utils import ExperimentConfig
parser = ArgumentParserPlus((ExperimentConfig,))
args = parser.parse()