Implementation:Hiyouga LLaMA Factory DPO Workflow
| Knowledge Sources | |
|---|---|
| Domains | Preference Optimization, Training Pipeline |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Orchestrates the end-to-end Direct Preference Optimization (DPO) training and evaluation workflow.
Description
The run_dpo function implements the complete DPO training pipeline. It loads the tokenizer, template, and pairwise dataset (using stage="rm"), then loads the model and creates a reference model if required by the finetuning arguments. The trainer selection supports both CustomDPOTrainer for standard training and KDPOTrainer for KTransformers-accelerated training based on the use_kt flag. The training phase includes metric logging, model saving, effective tokens-per-second calculation, and optional loss plotting for keys including "loss" and "rewards/accuracies". The evaluation phase handles the special case where the reference model is the same object as the model itself by removing reward-related metrics. Finally, it creates a model card and optionally pushes to the HuggingFace Hub.
Usage
Use this workflow as the main entry point for DPO preference optimization training. It is called by the training tuner when the training stage is set to "dpo".
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/train/dpo/workflow.py
- Lines: 1-119
Signature
def run_dpo(
model_args: "ModelArguments",
data_args: "DataArguments",
training_args: "Seq2SeqTrainingArguments",
finetuning_args: "FinetuningArguments",
callbacks: Optional[list["TrainerCallback"]] = None,
) -> None
Import
from llamafactory.train.dpo.workflow import run_dpo
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_args | ModelArguments | Yes | Model loading configuration including path, quantization, and compute settings |
| data_args | DataArguments | Yes | Dataset configuration including data path, template name, and padding options |
| training_args | Seq2SeqTrainingArguments | Yes | HuggingFace training arguments including output_dir, do_train, do_eval, resume_from_checkpoint |
| finetuning_args | FinetuningArguments | Yes | DPO-specific args including use_ref_model, ref_model, pref_beta, plot_loss, include_effective_tokens_per_second, use_kt |
| callbacks | Optional[list[TrainerCallback]] | No | Additional trainer callbacks to register |
Outputs
| Name | Type | Description |
|---|---|---|
| None | None | Side effects: saves model, metrics, and trainer state to output_dir; optionally pushes to Hub |
Usage Examples
# Running DPO training from the tuner
from llamafactory.train.dpo.workflow import run_dpo
run_dpo(
model_args=model_args,
data_args=data_args,
training_args=training_args,
finetuning_args=finetuning_args,
)
Related Pages
- Hiyouga_LLaMA_Factory_KTO_Workflow - Similar workflow for KTO preference training
- Hiyouga_LLaMA_Factory_MCA_Workflow - Megatron-Core Adapter variant with DPO support
- Hiyouga_LLaMA_Factory_Model_Patcher - Model patching applied during model loading