Implementation:Hiyouga LLaMA Factory KTO Workflow
| Knowledge Sources | |
|---|---|
| Domains | Preference Optimization, Training Pipeline |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Orchestrates the end-to-end Kahneman-Tversky Optimization (KTO) training and evaluation workflow.
Description
The run_kto function implements the complete KTO training pipeline, a preference optimization method that uses binary feedback (thumbs up/down) rather than pairwise comparisons. It loads the tokenizer, template, and dataset (using stage="kto"), then loads the model and creates a reference model. Unlike DPO, KTO always requires a reference model. The trainer uses KTODataCollatorWithPadding for data collation and CustomKTOTrainer for the training loop. Training includes metric logging and optional loss plotting for "loss" and "rewards/chosen" keys. The evaluation phase removes reward-related metrics when the reference model is the model itself. Finally, it creates a model card and optionally pushes to the HuggingFace Hub.
Usage
Use this workflow as the main entry point for KTO preference training. It is called by the training tuner when the training stage is set to "kto". KTO is preferred over DPO when only binary preference labels (good/bad) are available rather than pairwise preferences.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/train/kto/workflow.py
- Lines: 1-101
Signature
def run_kto(
model_args: "ModelArguments",
data_args: "DataArguments",
training_args: "Seq2SeqTrainingArguments",
finetuning_args: "FinetuningArguments",
callbacks: Optional[list["TrainerCallback"]] = None,
) -> None
Import
from llamafactory.train.kto.workflow import run_kto
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_args | ModelArguments | Yes | Model loading configuration including path, quantization, and compute settings |
| data_args | DataArguments | Yes | Dataset configuration including data path, template name, ignore_pad_token_for_loss |
| training_args | Seq2SeqTrainingArguments | Yes | HuggingFace training arguments including output_dir, do_train, do_eval, resume_from_checkpoint |
| finetuning_args | FinetuningArguments | Yes | KTO-specific args including ref_model path, plot_loss flag |
| callbacks | Optional[list[TrainerCallback]] | No | Additional trainer callbacks to register |
Outputs
| Name | Type | Description |
|---|---|---|
| None | None | Side effects: saves model, metrics, and trainer state to output_dir; optionally pushes to Hub |
Usage Examples
# Running KTO training from the tuner
from llamafactory.train.kto.workflow import run_kto
run_kto(
model_args=model_args,
data_args=data_args,
training_args=training_args,
finetuning_args=finetuning_args,
)
Related Pages
- Hiyouga_LLaMA_Factory_DPO_Workflow - Similar workflow for DPO pairwise preference training
- Hiyouga_LLaMA_Factory_Model_Patcher - Model patching applied during model loading