Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LLMBook zh LLMBook zh github io Get Data DPO

From Leeroopedia


Knowledge Sources
Domains NLP, Alignment, Data_Engineering
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for loading and processing Anthropic HH-RLHF preference data for DPO training provided by the LLMBook repository.

Description

The get_data function loads a preference dataset and applies split_prompt_and_responses_hh to each example. The inner function splits at the "\n\nAssistant:" delimiter, extracting the prompt and separating chosen/rejected responses. The result has three columns: prompt, chosen, rejected.

Usage

Use this to prepare the Anthropic/hh-rlhf dataset (or similar) for DPOTrainer.

Code Reference

Source Location

  • Repository: LLMBook-zh
  • File: code/8.2 DPO实践.py
  • Lines: 36-50

Signature

def get_data(split: str, data_path: str) -> Dataset:
    """
    Loads and processes preference data for DPO.

    Args:
        split: Dataset split (e.g., "train").
        data_path: HuggingFace dataset path (e.g., "Anthropic/hh-rlhf").

    Returns:
        Dataset with columns: prompt, chosen, rejected.
    """

def split_prompt_and_responses_hh(sample: dict) -> dict:
    """
    Inner function that splits a sample at '\\n\\nAssistant:' delimiter.

    Args:
        sample: Dict with 'chosen' and 'rejected' full conversation strings.

    Returns:
        Dict with 'prompt', 'chosen' (response only), 'rejected' (response only).
    """

Import

from dpo_training import get_data

I/O Contract

Inputs

Name Type Required Description
split str Yes Dataset split (e.g., "train")
data_path str Yes HuggingFace dataset path (e.g., "Anthropic/hh-rlhf")

Outputs

Name Type Description
return Dataset Dataset with columns: prompt, chosen (response), rejected (response)

Usage Examples

from dpo_training import get_data

# Load and process Anthropic HH-RLHF data
train_dataset = get_data("train", "Anthropic/hh-rlhf")

# Inspect an example
example = train_dataset[0]
print(f"Prompt: {example['prompt'][:100]}...")
print(f"Chosen: {example['chosen'][:100]}...")
print(f"Rejected: {example['rejected'][:100]}...")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment