Implementation:Eric mitchell Direct preference optimization Hydra Main Config
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Configuration_Management, Experiment_Management |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
Wrapper for Hydra's @hydra.main decorator and OmegaConf configuration system as used in this repository for composable training configuration.
Description
The repository uses @hydra.main to decorate the main entry point function, which automatically loads and composes YAML configuration files. OmegaConf.resolve resolves interpolations and custom resolvers. The config system supports:
- Defaults list: Composes loss and model sub-configs into the main config
- CLI overrides: All parameters can be overridden from command line
- Missing key validation: OmegaConf.missing_keys ensures all required parameters are provided
- Config serialization: Resolved config is saved to run directory
Usage
This is the entry point mechanism for all training runs. Users invoke training via CLI with parameter overrides.
Code Reference
Source Location
- Repository: direct-preference-optimization
- File: train.py (L48-73), config/config.yaml (L1-97), config/loss/sft.yaml, config/loss/dpo.yaml (L1-14)
Signature
@hydra.main(version_base=None, config_path="config", config_name="config")
def main(config: DictConfig):
"""Main entry point for training.
Validates config, creates/initializes model(s), and kicks off worker process(es).
"""
OmegaConf.resolve(config)
missing_keys = OmegaConf.missing_keys(config)
if missing_keys:
raise ValueError(f"Got missing keys in config:\n{missing_keys}")
# ...
Import
import hydra
from omegaconf import OmegaConf, DictConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config/config.yaml | YAML | Yes | Main config with defaults, hyperparameters, wandb settings |
| config/loss/sft.yaml | YAML | Conditional | SFT loss config (name: sft) |
| config/loss/dpo.yaml | YAML | Conditional | DPO loss config (name, beta, label_smoothing, reference_free) |
| config/model/*.yaml | YAML | Yes | Model config (name_or_path, block_name, dtypes) |
| CLI arguments | str | No | Override any parameter (e.g., loss.beta=0.1, exp_name=test) |
Outputs
| Name | Type | Description |
|---|---|---|
| config | DictConfig | Fully resolved configuration object with all hyperparameters |
| config.yaml | File | Serialized config saved to {local_run_dir}/config.yaml |
Usage Examples
SFT Training Command
# CLI invocation
# python train.py loss=sft model=pythia28 datasets=[hh] exp_name=pythia28_sft
DPO Training Command
# CLI invocation
# python train.py loss=dpo loss.beta=0.1 model=pythia28 \
# model.archive=/path/to/sft/LATEST/policy.pt \
# datasets=[hh] exp_name=pythia28_dpo
Key Config Parameters
# config/config.yaml key parameters:
# exp_name: ??? # REQUIRED - experiment name
# batch_size: 4 # training batch size
# lr: 5e-7 # learning rate
# optimizer: RMSprop # optimizer class
# warmup_steps: 150 # LR warmup steps
# max_length: 512 # max sequence length
# max_prompt_length: 256 # max prompt length
# n_epochs: 1 # training epochs
# eval_every: 20000 # eval frequency (in examples)
# trainer: BasicTrainer # trainer class name
# config/loss/dpo.yaml key parameters:
# name: dpo
# beta: ??? # REQUIRED - DPO temperature
# label_smoothing: 0 # conservative DPO noise
# reference_free: false # use uniform reference
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment