Implementation:Hiyouga LLaMA Factory KTO Workflow

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Preference Optimization, Training Pipeline
Last Updated	2026-02-06 19:00 GMT

Overview

Orchestrates the end-to-end Kahneman-Tversky Optimization (KTO) training and evaluation workflow.

Description

The run_kto function implements the complete KTO training pipeline, a preference optimization method that uses binary feedback (thumbs up/down) rather than pairwise comparisons. It loads the tokenizer, template, and dataset (using stage="kto"), then loads the model and creates a reference model. Unlike DPO, KTO always requires a reference model. The trainer uses KTODataCollatorWithPadding for data collation and CustomKTOTrainer for the training loop. Training includes metric logging and optional loss plotting for "loss" and "rewards/chosen" keys. The evaluation phase removes reward-related metrics when the reference model is the model itself. Finally, it creates a model card and optionally pushes to the HuggingFace Hub.

Usage

Use this workflow as the main entry point for KTO preference training. It is called by the training tuner when the training stage is set to "kto". KTO is preferred over DPO when only binary preference labels (good/bad) are available rather than pairwise preferences.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/train/kto/workflow.py
Lines: 1-101

Signature

def run_kto(
    model_args: "ModelArguments",
    data_args: "DataArguments",
    training_args: "Seq2SeqTrainingArguments",
    finetuning_args: "FinetuningArguments",
    callbacks: Optional[list["TrainerCallback"]] = None,
) -> None

Import

from llamafactory.train.kto.workflow import run_kto

I/O Contract

Inputs

Name	Type	Required	Description
model_args	ModelArguments	Yes	Model loading configuration including path, quantization, and compute settings
data_args	DataArguments	Yes	Dataset configuration including data path, template name, ignore_pad_token_for_loss
training_args	Seq2SeqTrainingArguments	Yes	HuggingFace training arguments including output_dir, do_train, do_eval, resume_from_checkpoint
finetuning_args	FinetuningArguments	Yes	KTO-specific args including ref_model path, plot_loss flag
callbacks	Optional[list[TrainerCallback]]	No	Additional trainer callbacks to register

Outputs

Name	Type	Description
None	None	Side effects: saves model, metrics, and trainer state to output_dir; optionally pushes to Hub

Usage Examples

# Running KTO training from the tuner
from llamafactory.train.kto.workflow import run_kto

run_kto(
    model_args=model_args,
    data_args=data_args,
    training_args=training_args,
    finetuning_args=finetuning_args,
)

Related Pages

Hiyouga_LLaMA_Factory_DPO_Workflow - Similar workflow for DPO pairwise preference training
Hiyouga_LLaMA_Factory_Model_Patcher - Model patching applied during model loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment