Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Hiyouga LLaMA Factory Kahneman Tversky Optimization

From Leeroopedia


Knowledge Sources
Domains Natural Language Processing, Language Model Alignment, Preference Learning, Behavioral Economics
Last Updated 2026-02-06 19:00 GMT

Overview

A preference alignment technique inspired by Kahneman and Tversky's prospect theory that aligns language models using per-example binary feedback (desirable/undesirable) rather than pairwise preference comparisons.

Description

Kahneman-Tversky Optimization (KTO), introduced by Ethayarajh et al. (2024), is an alignment method that draws on insights from behavioral economics. Unlike DPO, which requires paired chosen/rejected responses for the same prompt, KTO operates on unpaired binary feedback: each response is independently labeled as either desirable or undesirable. This is grounded in prospect theory's observation that humans evaluate outcomes relative to a reference point, with losses being weighted more heavily than equivalent gains.

KTO is significant because:

  • Lower data requirements: It does not require paired preferences -- each example needs only a single binary label (thumbs up or thumbs down).
  • More natural feedback signal: Binary approval/disapproval is easier to collect at scale than pairwise comparisons.
  • Asymmetric loss weighting: Desirable and undesirable examples can be weighted differently, reflecting the empirical finding that humans are more sensitive to losses than to gains.
  • KL-anchored alignment: A KL divergence term computed over separate KL examples prevents the policy from deviating too far from the reference distribution.

Usage

Use KTO when you want to:

  • Align a language model using binary feedback data (approve/reject per response) rather than paired comparisons.
  • Leverage existing datasets where responses are independently rated without paired alternatives.
  • Apply asymmetric weighting to penalize bad outputs more heavily than rewarding good ones.
  • Avoid the paired data requirement of DPO while maintaining stable alignment.

KTO is particularly suitable when collecting pairwise preferences is impractical, such as in production settings where user feedback is collected as binary thumbs-up/thumbs-down signals.

Theoretical Basis

Prospect Theory Foundation

KTO is grounded in Kahneman and Tversky's prospect theory, which models human decision-making under uncertainty. The key insight is that humans evaluate outcomes as gains or losses relative to a reference point rather than in absolute terms, and that the value function is asymmetric: losses loom larger than gains.

The implicit reward for a response y given prompt x is:

rθ(x,y)=βlogπθ(yx)πref(yx)

KTO Loss Function

The KTO loss separates the treatment of desirable and undesirable examples:

KTO(θ)=𝔼(x,y)[w(y)(1vθ(x,y))]

where the value function vθ is defined differently for desirable and undesirable examples:

vθ(x,y)={σ(rθ(x,y)zref)if y is desirableσ(zrefrθ(x,y))if y is undesirable

Here zref=𝔼x𝒟[βKL(πθ(x)πref(x))] is the KL-based reference point, σ is the sigmoid function, and w(y) is the per-class weight:

w(y)={λDif y is desirableλUif y is undesirable

The weights λD (desirable weight) and λU (undesirable weight) allow asymmetric treatment, reflecting prospect theory's loss aversion. Typically λU>λD to penalize undesirable behavior more strongly.

KL Reference Point

The KL reference point zref is estimated using separate KL examples. For each training example, an additional KL response is sampled to compute the implicit KL divergence, ensuring the policy stays anchored to the reference model distribution.

Auxiliary SFT Loss

Similar to DPO, an optional auxiliary SFT loss on desirable examples can be added:

total=KTO+γftxSFT(ydesirable)

This helps maintain generation quality while aligning to preference feedback.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment