Implementation:Eric mitchell Direct preference optimization Canonical Dataset Format

Knowledge Sources	Direct Preference Optimization
Domains	Data_Engineering, NLP, Preference_Learning
Last Updated	2026-02-08 02:00 GMT

Overview

Pattern documentation for the canonical preference dataset format, with get_hh as the reference implementation.

Description

This is a Pattern Doc documenting the interface that all dataset loaders must implement. The pattern is demonstrated by three reference implementations: get_hh (Anthropic HH-RLHF), get_shp (Stanford Human Preferences), and get_se (StackExchange). Each converts a different raw data source into the same canonical format.

The format contract is enforced by an assertion in get_dataset at preference_datasets.py:L174-175 that checks the keys are exactly {responses, pairs, sft_target}.

Usage

Follow this pattern when implementing a custom dataset loader (get_xyz function) for integration into the training pipeline.

Code Reference

Source Location

Repository: direct-preference-optimization
File: preference_datasets.py
Lines: 120-161 (get_hh reference), 85-117 (get_shp), 46-83 (get_se), 174-175 (assertion)

Signature

# Pattern interface: all dataset loaders must match this signature
def get_xyz(
    split: str,
    silent: bool = False,
    cache_dir: str = None,
) -> Dict[str, Dict[str, Union[List[Tuple[int, int]], List[str], str]]]:
    """Load a dataset and return it in canonical format.

    Args:
        split: "train" or "test"
        silent: Suppress progress bars
        cache_dir: Dataset cache directory

    Returns:
        Dict mapping prompt strings to dicts with keys:
            'responses': List[str] - all response texts
            'pairs': List[Tuple[int, int]] - (winner_idx, loser_idx) tuples
            'sft_target': str - best response for SFT
    """

Import

# Reference implementations
from preference_datasets import get_hh, get_shp, get_se

I/O Contract

Inputs

Name	Type	Required	Description
split	str	Yes	"train" or "test"
silent	bool	No	Suppress progress bars (default False)
cache_dir	str	No	HuggingFace dataset cache directory

Outputs

Name	Type	Description
data	Dict[str, Dict]	Keys are prompt strings formatted as "\n\nHuman: ...\n\nAssistant:". Values are dicts with exactly three keys: responses (List[str]), pairs (List[Tuple[int, int]]), sft_target (str)

Usage Examples

Reference Implementation (get_hh)

from preference_datasets import get_hh

data = get_hh(split='train', cache_dir='.cache')

# Example entry
prompt = list(data.keys())[0]
# prompt: "\n\nHuman: What is the capital of France?\n\nAssistant:"
entry = data[prompt]
# entry['responses']: [" Paris is the capital.", " London is the capital."]
# entry['pairs']: [(0, 1)]  # response 0 preferred over response 1
# entry['sft_target']: " Paris is the capital."

Custom Dataset Template

from collections import defaultdict
import datasets

def get_custom(split, silent=False, cache_dir=None):
    """Custom dataset loader following the canonical format."""
    dataset = datasets.load_dataset('my_org/my_dataset', split=split, cache_dir=cache_dir)

    data = defaultdict(dict)
    for row in dataset:
        prompt = '\n\nHuman: ' + row['question'] + '\n\nAssistant:'
        data[prompt]['responses'] = [row['good_answer'], row['bad_answer']]
        data[prompt]['pairs'] = [(0, 1)]
        data[prompt]['sft_target'] = row['good_answer']

    return data

Related Pages

Implements Principle

Principle:Eric_mitchell_Direct_preference_optimization_Preference_Data_Format

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment