Implementation:Eric mitchell Direct preference optimization Canonical Dataset Format
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP, Preference_Learning |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
Pattern documentation for the canonical preference dataset format, with get_hh as the reference implementation.
Description
This is a Pattern Doc documenting the interface that all dataset loaders must implement. The pattern is demonstrated by three reference implementations: get_hh (Anthropic HH-RLHF), get_shp (Stanford Human Preferences), and get_se (StackExchange). Each converts a different raw data source into the same canonical format.
The format contract is enforced by an assertion in get_dataset at preference_datasets.py:L174-175 that checks the keys are exactly {responses, pairs, sft_target}.
Usage
Follow this pattern when implementing a custom dataset loader (get_xyz function) for integration into the training pipeline.
Code Reference
Source Location
- Repository: direct-preference-optimization
- File: preference_datasets.py
- Lines: 120-161 (get_hh reference), 85-117 (get_shp), 46-83 (get_se), 174-175 (assertion)
Signature
# Pattern interface: all dataset loaders must match this signature
def get_xyz(
split: str,
silent: bool = False,
cache_dir: str = None,
) -> Dict[str, Dict[str, Union[List[Tuple[int, int]], List[str], str]]]:
"""Load a dataset and return it in canonical format.
Args:
split: "train" or "test"
silent: Suppress progress bars
cache_dir: Dataset cache directory
Returns:
Dict mapping prompt strings to dicts with keys:
'responses': List[str] - all response texts
'pairs': List[Tuple[int, int]] - (winner_idx, loser_idx) tuples
'sft_target': str - best response for SFT
"""
Import
# Reference implementations
from preference_datasets import get_hh, get_shp, get_se
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| split | str | Yes | "train" or "test" |
| silent | bool | No | Suppress progress bars (default False) |
| cache_dir | str | No | HuggingFace dataset cache directory |
Outputs
| Name | Type | Description |
|---|---|---|
| data | Dict[str, Dict] | Keys are prompt strings formatted as "\n\nHuman: ...\n\nAssistant:". Values are dicts with exactly three keys: responses (List[str]), pairs (List[Tuple[int, int]]), sft_target (str) |
Usage Examples
Reference Implementation (get_hh)
from preference_datasets import get_hh
data = get_hh(split='train', cache_dir='.cache')
# Example entry
prompt = list(data.keys())[0]
# prompt: "\n\nHuman: What is the capital of France?\n\nAssistant:"
entry = data[prompt]
# entry['responses']: [" Paris is the capital.", " London is the capital."]
# entry['pairs']: [(0, 1)] # response 0 preferred over response 1
# entry['sft_target']: " Paris is the capital."
Custom Dataset Template
from collections import defaultdict
import datasets
def get_custom(split, silent=False, cache_dir=None):
"""Custom dataset loader following the canonical format."""
dataset = datasets.load_dataset('my_org/my_dataset', split=split, cache_dir=cache_dir)
data = defaultdict(dict)
for row in dataset:
prompt = '\n\nHuman: ' + row['question'] + '\n\nAssistant:'
data[prompt]['responses'] = [row['good_answer'], row['bad_answer']]
data[prompt]['pairs'] = [(0, 1)]
data[prompt]['sft_target'] = row['good_answer']
return data