Implementation:Hiyouga LLaMA Factory DPO Workflow

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Preference Optimization, Training Pipeline
Last Updated	2026-02-06 19:00 GMT

Overview

Orchestrates the end-to-end Direct Preference Optimization (DPO) training and evaluation workflow.

Description

The run_dpo function implements the complete DPO training pipeline. It loads the tokenizer, template, and pairwise dataset (using stage="rm"), then loads the model and creates a reference model if required by the finetuning arguments. The trainer selection supports both CustomDPOTrainer for standard training and KDPOTrainer for KTransformers-accelerated training based on the use_kt flag. The training phase includes metric logging, model saving, effective tokens-per-second calculation, and optional loss plotting for keys including "loss" and "rewards/accuracies". The evaluation phase handles the special case where the reference model is the same object as the model itself by removing reward-related metrics. Finally, it creates a model card and optionally pushes to the HuggingFace Hub.

Usage

Use this workflow as the main entry point for DPO preference optimization training. It is called by the training tuner when the training stage is set to "dpo".

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/train/dpo/workflow.py
Lines: 1-119

Signature

def run_dpo(
    model_args: "ModelArguments",
    data_args: "DataArguments",
    training_args: "Seq2SeqTrainingArguments",
    finetuning_args: "FinetuningArguments",
    callbacks: Optional[list["TrainerCallback"]] = None,
) -> None

Import

from llamafactory.train.dpo.workflow import run_dpo

I/O Contract

Inputs

Name	Type	Required	Description
model_args	ModelArguments	Yes	Model loading configuration including path, quantization, and compute settings
data_args	DataArguments	Yes	Dataset configuration including data path, template name, and padding options
training_args	Seq2SeqTrainingArguments	Yes	HuggingFace training arguments including output_dir, do_train, do_eval, resume_from_checkpoint
finetuning_args	FinetuningArguments	Yes	DPO-specific args including use_ref_model, ref_model, pref_beta, plot_loss, include_effective_tokens_per_second, use_kt
callbacks	Optional[list[TrainerCallback]]	No	Additional trainer callbacks to register

Outputs

Name	Type	Description
None	None	Side effects: saves model, metrics, and trainer state to output_dir; optionally pushes to Hub

Usage Examples

# Running DPO training from the tuner
from llamafactory.train.dpo.workflow import run_dpo

run_dpo(
    model_args=model_args,
    data_args=data_args,
    training_args=training_args,
    finetuning_args=finetuning_args,
)

Related Pages

Hiyouga_LLaMA_Factory_KTO_Workflow - Similar workflow for KTO preference training
Hiyouga_LLaMA_Factory_MCA_Workflow - Megatron-Core Adapter variant with DPO support
Hiyouga_LLaMA_Factory_Model_Patcher - Model patching applied during model loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment