Implementation:LLMBook zh LLMBook zh github io Get Data DPO

Knowledge Sources	LLMBook-zh
Domains	NLP, Alignment, Data_Engineering
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for loading and processing Anthropic HH-RLHF preference data for DPO training provided by the LLMBook repository.

Description

The get_data function loads a preference dataset and applies split_prompt_and_responses_hh to each example. The inner function splits at the "\n\nAssistant:" delimiter, extracting the prompt and separating chosen/rejected responses. The result has three columns: prompt, chosen, rejected.

Usage

Use this to prepare the Anthropic/hh-rlhf dataset (or similar) for DPOTrainer.

Code Reference

Source Location

Repository: LLMBook-zh
File: code/8.2 DPO实践.py
Lines: 36-50

Signature

def get_data(split: str, data_path: str) -> Dataset:
    """
    Loads and processes preference data for DPO.

    Args:
        split: Dataset split (e.g., "train").
        data_path: HuggingFace dataset path (e.g., "Anthropic/hh-rlhf").

    Returns:
        Dataset with columns: prompt, chosen, rejected.
    """

def split_prompt_and_responses_hh(sample: dict) -> dict:
    """
    Inner function that splits a sample at '\\n\\nAssistant:' delimiter.

    Args:
        sample: Dict with 'chosen' and 'rejected' full conversation strings.

    Returns:
        Dict with 'prompt', 'chosen' (response only), 'rejected' (response only).
    """

Import

from dpo_training import get_data

I/O Contract

Inputs

Name	Type	Required	Description
split	str	Yes	Dataset split (e.g., "train")
data_path	str	Yes	HuggingFace dataset path (e.g., "Anthropic/hh-rlhf")

Outputs

Name	Type	Description
return	Dataset	Dataset with columns: prompt, chosen (response), rejected (response)

Usage Examples

from dpo_training import get_data

# Load and process Anthropic HH-RLHF data
train_dataset = get_data("train", "Anthropic/hh-rlhf")

# Inspect an example
example = train_dataset[0]
print(f"Prompt: {example['prompt'][:100]}...")
print(f"Chosen: {example['chosen'][:100]}...")
print(f"Rejected: {example['rejected'][:100]}...")

Related Pages

Implements Principle

Principle:LLMBook_zh_LLMBook_zh_github_io_Preference_Data_Preparation

Requires Environment

Environment:LLMBook_zh_LLMBook_zh_github_io_HuggingFace_Transformers_Stack

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment