Implementation:Axolotl ai cloud Axolotl Prepare Preference Datasets
| Knowledge Sources | |
|---|---|
| Domains | Data_Preparation, Alignment |
| Last Updated | 2026-02-06 23:00 GMT |
Overview
Concrete tool for loading and formatting preference datasets for DPO/IPO/KTO alignment training provided by the Axolotl framework.
Description
The prepare_preference_datasets function loads preference data, applies the appropriate prompt strategy based on the RL type (DPO, KTO, ORPO), formats chosen/rejected pairs with chat templates, handles deduplication, and splits into train/eval sets. It delegates to format-specific loaders in axolotl.prompt_strategies.dpo, axolotl.prompt_strategies.kto, and axolotl.prompt_strategies.orpo.
Usage
This function is called by load_datasets when cfg.rl is set (DPO, IPO, KTO, ORPO, SimPO). It replaces the standard SFT dataset preparation path.
Code Reference
Source Location
- Repository: axolotl
- File: src/axolotl/utils/data/rl.py
- Lines: L37-83
Signature
def prepare_preference_datasets(
cfg: DictDefault,
tokenizer: PreTrainedTokenizer,
) -> tuple[Dataset, Dataset | None]:
"""Prepare preference datasets for alignment training.
Args:
cfg: Configuration with datasets list, rl type, val_set_size.
tokenizer: Tokenizer for prompt formatting.
Returns:
Tuple of (train_dataset, eval_dataset or None).
"""
Import
from axolotl.utils.data.rl import prepare_preference_datasets
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| cfg | DictDefault | Yes | Config with datasets (list of preference dataset specs), rl (RL type: dpo/ipo/kto/orpo), val_set_size, dataset_exact_deduplication |
| tokenizer | PreTrainedTokenizer | Yes | Tokenizer for prompt formatting and chat template application |
Outputs
| Name | Type | Description |
|---|---|---|
| train_dataset | Dataset | Formatted preference pairs ready for DPO/KTO/ORPO trainer |
| eval_dataset | Dataset or None | Evaluation preference pairs (None if val_set_size=0) |
Usage Examples
Preparing DPO Datasets
from axolotl.utils.data.rl import prepare_preference_datasets
from axolotl.loaders.tokenizer import load_tokenizer
tokenizer = load_tokenizer(cfg)
# cfg.rl = "dpo"
# cfg.datasets = [{"path": "Intel/orca_dpo_pairs", "type": "chatml.intel"}]
train_dataset, eval_dataset = prepare_preference_datasets(cfg, tokenizer)
print(f"Training pairs: {len(train_dataset)}")
print(f"Sample keys: {train_dataset[0].keys()}") # chosen, rejected, prompt