Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Axolotl ai cloud Axolotl Prepare Preference Datasets

From Leeroopedia


Knowledge Sources
Domains Data_Preparation, Alignment
Last Updated 2026-02-06 23:00 GMT

Overview

Concrete tool for loading and formatting preference datasets for DPO/IPO/KTO alignment training provided by the Axolotl framework.

Description

The prepare_preference_datasets function loads preference data, applies the appropriate prompt strategy based on the RL type (DPO, KTO, ORPO), formats chosen/rejected pairs with chat templates, handles deduplication, and splits into train/eval sets. It delegates to format-specific loaders in axolotl.prompt_strategies.dpo, axolotl.prompt_strategies.kto, and axolotl.prompt_strategies.orpo.

Usage

This function is called by load_datasets when cfg.rl is set (DPO, IPO, KTO, ORPO, SimPO). It replaces the standard SFT dataset preparation path.

Code Reference

Source Location

  • Repository: axolotl
  • File: src/axolotl/utils/data/rl.py
  • Lines: L37-83

Signature

def prepare_preference_datasets(
    cfg: DictDefault,
    tokenizer: PreTrainedTokenizer,
) -> tuple[Dataset, Dataset | None]:
    """Prepare preference datasets for alignment training.

    Args:
        cfg: Configuration with datasets list, rl type, val_set_size.
        tokenizer: Tokenizer for prompt formatting.

    Returns:
        Tuple of (train_dataset, eval_dataset or None).
    """

Import

from axolotl.utils.data.rl import prepare_preference_datasets

I/O Contract

Inputs

Name Type Required Description
cfg DictDefault Yes Config with datasets (list of preference dataset specs), rl (RL type: dpo/ipo/kto/orpo), val_set_size, dataset_exact_deduplication
tokenizer PreTrainedTokenizer Yes Tokenizer for prompt formatting and chat template application

Outputs

Name Type Description
train_dataset Dataset Formatted preference pairs ready for DPO/KTO/ORPO trainer
eval_dataset Dataset or None Evaluation preference pairs (None if val_set_size=0)

Usage Examples

Preparing DPO Datasets

from axolotl.utils.data.rl import prepare_preference_datasets
from axolotl.loaders.tokenizer import load_tokenizer

tokenizer = load_tokenizer(cfg)

# cfg.rl = "dpo"
# cfg.datasets = [{"path": "Intel/orca_dpo_pairs", "type": "chatml.intel"}]
train_dataset, eval_dataset = prepare_preference_datasets(cfg, tokenizer)

print(f"Training pairs: {len(train_dataset)}")
print(f"Sample keys: {train_dataset[0].keys()}")  # chosen, rejected, prompt

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment