Principle:NVIDIA NeMo Aligner KTO Data Preparation
| Knowledge Sources | |
|---|---|
| Domains | KTO, Data Preprocessing, Preference Learning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
KTO (Kahneman-Tversky Optimization) data preparation converts paired preference data into the binary feedback format required by KTO, where each sample is independently labeled as desirable or undesirable.
Description
KTO is an alignment method that, unlike DPO (Direct Preference Optimization), does not require paired comparisons (chosen vs. rejected for the same prompt). Instead, KTO operates on independently labeled samples, where each (prompt, response) pair is annotated with a binary signal indicating whether the response is "chosen" (desirable) or "rejected" (undesirable).
The KTO data preparation pipeline in NeMo Aligner converts the Anthropic Helpful-Harmless (HH-RLHF) dataset from its native paired preference format into the binary feedback format. Specifically:
- Dataset loading: The Anthropic HH-RLHF dataset is downloaded from HuggingFace, containing paired chosen/rejected conversations.
- Conversation parsing: Each raw conversation string (with
\n\nHuman:and\n\nAssistant:delimiters) is parsed into a structured prompt-response format usingHuman:\n{body}\nAssistant:\n{response}templates. - Unpacking pairs: Each paired comparison is unpacked into two independent samples:
- The chosen response gets
"preference": "chosen" - The rejected response gets
"preference": "rejected"
- Each sample retains the shared prompt and its own response.
- The chosen response gets
- Output: The samples are saved as JSONL files with train and test splits.
This transformation is the key distinction from DPO data preparation: while DPO needs (prompt, chosen, rejected) tuples, KTO needs individual (prompt, response, preference_label) samples.
Usage
Use KTO data preparation when:
- You are training a model using the KTO alignment algorithm
- You need to convert paired preference data into binary feedback format
- You want to use the Anthropic HH dataset with KTO training
Theoretical Basis
KTO is based on Kahneman-Tversky prospect theory from behavioral economics. The key insights are:
- Loss aversion: Humans weigh losses more heavily than equivalent gains. KTO incorporates this asymmetry by treating desirable and undesirable examples differently in the loss function.
- Binary feedback sufficiency: Unlike DPO which requires explicit pairwise comparisons, KTO can learn from independent binary signals (good/bad) about individual responses. This is practically advantageous because binary feedback is often easier and cheaper to collect than pairwise comparisons.
- Reference-free evaluation: Each sample is evaluated independently, so the training data does not need to maintain the pairing structure between chosen and rejected responses for the same prompt.
The data preparation step is critical because it transforms the commonly available paired preference format (as used in RLHF and DPO) into the unpacked binary format that KTO expects. Each original comparison pair yields two training samples, effectively doubling the dataset size while changing the supervision signal from comparative to absolute.