Implementation:LLMBook zh LLMBook zh github io Get Data DPO
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, Alignment, Data_Engineering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for loading and processing Anthropic HH-RLHF preference data for DPO training provided by the LLMBook repository.
Description
The get_data function loads a preference dataset and applies split_prompt_and_responses_hh to each example. The inner function splits at the "\n\nAssistant:" delimiter, extracting the prompt and separating chosen/rejected responses. The result has three columns: prompt, chosen, rejected.
Usage
Use this to prepare the Anthropic/hh-rlhf dataset (or similar) for DPOTrainer.
Code Reference
Source Location
- Repository: LLMBook-zh
- File: code/8.2 DPO实践.py
- Lines: 36-50
Signature
def get_data(split: str, data_path: str) -> Dataset:
"""
Loads and processes preference data for DPO.
Args:
split: Dataset split (e.g., "train").
data_path: HuggingFace dataset path (e.g., "Anthropic/hh-rlhf").
Returns:
Dataset with columns: prompt, chosen, rejected.
"""
def split_prompt_and_responses_hh(sample: dict) -> dict:
"""
Inner function that splits a sample at '\\n\\nAssistant:' delimiter.
Args:
sample: Dict with 'chosen' and 'rejected' full conversation strings.
Returns:
Dict with 'prompt', 'chosen' (response only), 'rejected' (response only).
"""
Import
from dpo_training import get_data
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| split | str | Yes | Dataset split (e.g., "train") |
| data_path | str | Yes | HuggingFace dataset path (e.g., "Anthropic/hh-rlhf") |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Dataset | Dataset with columns: prompt, chosen (response), rejected (response) |
Usage Examples
from dpo_training import get_data
# Load and process Anthropic HH-RLHF data
train_dataset = get_data("train", "Anthropic/hh-rlhf")
# Inspect an example
example = train_dataset[0]
print(f"Prompt: {example['prompt'][:100]}...")
print(f"Chosen: {example['chosen'][:100]}...")
print(f"Rejected: {example['rejected'][:100]}...")
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment