Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory KTO Workflow

From Leeroopedia


Knowledge Sources
Domains Preference Optimization, Training Pipeline
Last Updated 2026-02-06 19:00 GMT

Overview

Orchestrates the end-to-end Kahneman-Tversky Optimization (KTO) training and evaluation workflow.

Description

The run_kto function implements the complete KTO training pipeline, a preference optimization method that uses binary feedback (thumbs up/down) rather than pairwise comparisons. It loads the tokenizer, template, and dataset (using stage="kto"), then loads the model and creates a reference model. Unlike DPO, KTO always requires a reference model. The trainer uses KTODataCollatorWithPadding for data collation and CustomKTOTrainer for the training loop. Training includes metric logging and optional loss plotting for "loss" and "rewards/chosen" keys. The evaluation phase removes reward-related metrics when the reference model is the model itself. Finally, it creates a model card and optionally pushes to the HuggingFace Hub.

Usage

Use this workflow as the main entry point for KTO preference training. It is called by the training tuner when the training stage is set to "kto". KTO is preferred over DPO when only binary preference labels (good/bad) are available rather than pairwise preferences.

Code Reference

Source Location

Signature

def run_kto(
    model_args: "ModelArguments",
    data_args: "DataArguments",
    training_args: "Seq2SeqTrainingArguments",
    finetuning_args: "FinetuningArguments",
    callbacks: Optional[list["TrainerCallback"]] = None,
) -> None

Import

from llamafactory.train.kto.workflow import run_kto

I/O Contract

Inputs

Name Type Required Description
model_args ModelArguments Yes Model loading configuration including path, quantization, and compute settings
data_args DataArguments Yes Dataset configuration including data path, template name, ignore_pad_token_for_loss
training_args Seq2SeqTrainingArguments Yes HuggingFace training arguments including output_dir, do_train, do_eval, resume_from_checkpoint
finetuning_args FinetuningArguments Yes KTO-specific args including ref_model path, plot_loss flag
callbacks Optional[list[TrainerCallback]] No Additional trainer callbacks to register

Outputs

Name Type Description
None None Side effects: saves model, metrics, and trainer state to output_dir; optionally pushes to Hub

Usage Examples

# Running KTO training from the tuner
from llamafactory.train.kto.workflow import run_kto

run_kto(
    model_args=model_args,
    data_args=data_args,
    training_args=training_args,
    finetuning_args=finetuning_args,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment