Implementation:Axolotl ai cloud Axolotl Prepare Preference Datasets

Knowledge Sources	Axolotl TRL DPO
Domains	Data_Preparation, Alignment
Last Updated	2026-02-06 23:00 GMT

Overview

Concrete tool for loading and formatting preference datasets for DPO/IPO/KTO alignment training provided by the Axolotl framework.

Description

The prepare_preference_datasets function loads preference data, applies the appropriate prompt strategy based on the RL type (DPO, KTO, ORPO), formats chosen/rejected pairs with chat templates, handles deduplication, and splits into train/eval sets. It delegates to format-specific loaders in axolotl.prompt_strategies.dpo, axolotl.prompt_strategies.kto, and axolotl.prompt_strategies.orpo.

Usage

This function is called by load_datasets when cfg.rl is set (DPO, IPO, KTO, ORPO, SimPO). It replaces the standard SFT dataset preparation path.

Code Reference

Source Location

Repository: axolotl
File: src/axolotl/utils/data/rl.py
Lines: L37-83

Signature

def prepare_preference_datasets(
    cfg: DictDefault,
    tokenizer: PreTrainedTokenizer,
) -> tuple[Dataset, Dataset | None]:
    """Prepare preference datasets for alignment training.

    Args:
        cfg: Configuration with datasets list, rl type, val_set_size.
        tokenizer: Tokenizer for prompt formatting.

    Returns:
        Tuple of (train_dataset, eval_dataset or None).
    """

Import

from axolotl.utils.data.rl import prepare_preference_datasets

I/O Contract

Inputs

Name	Type	Required	Description
cfg	DictDefault	Yes	Config with datasets (list of preference dataset specs), rl (RL type: dpo/ipo/kto/orpo), val_set_size, dataset_exact_deduplication
tokenizer	PreTrainedTokenizer	Yes	Tokenizer for prompt formatting and chat template application

Outputs

Name	Type	Description
train_dataset	Dataset	Formatted preference pairs ready for DPO/KTO/ORPO trainer
eval_dataset	Dataset or None	Evaluation preference pairs (None if val_set_size=0)

Usage Examples

Preparing DPO Datasets

from axolotl.utils.data.rl import prepare_preference_datasets
from axolotl.loaders.tokenizer import load_tokenizer

tokenizer = load_tokenizer(cfg)

# cfg.rl = "dpo"
# cfg.datasets = [{"path": "Intel/orca_dpo_pairs", "type": "chatml.intel"}]
train_dataset, eval_dataset = prepare_preference_datasets(cfg, tokenizer)

print(f"Training pairs: {len(train_dataset)}")
print(f"Sample keys: {train_dataset[0].keys()}")  # chosen, rejected, prompt

Related Pages

Implements Principle

Principle:Axolotl_ai_cloud_Axolotl_Preference_Dataset_Preparation

Requires Environment

Environment:Axolotl_ai_cloud_Axolotl_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment