Implementation:NVIDIA NeMo Aligner Preprocess AnthropicHH Data
| Knowledge Sources | |
|---|---|
| Domains | KTO, Data Preprocessing, Preference Learning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A script that downloads the Anthropic Helpful-Harmless dataset and converts it from paired preference format into the binary feedback format required by KTO (Kahneman-Tversky Optimization) training.
Description
preprocess_anthropichh_data.py processes the Anthropic HH-RLHF dataset for KTO training. The script performs:
- Dataset loading: Downloads the full
Anthropic/hh-rlhfdataset from HuggingFace. If the validation split is requested, it falls back to the test split (since Anthropic HH has no validation set). - Conversation parsing: Each conversation string is split on
\n\nHuman:and\n\nAssistant:delimiters to extract the prompt body and response. The parsed text is formatted using simple templates:Human:\n{body}\nAssistant:\n{response}. - Preference unpacking: Each paired comparison (chosen and rejected) is unpacked into two separate dictionaries:
{"prompt": "...", "response": "...", "preference": "chosen"}{"prompt": "...", "response": "...", "preference": "rejected"}
- Pairs where the chosen and rejected prompts do not match are discarded.
- Output saving: Saves train.jsonl and test.jsonl to the specified output directory, with one JSON object per line.
Note that this script uses a simpler prompt format (Human:\n{body}\nAssistant:\n{response}) compared to the chat-template-based format used in the CAI preprocessing scripts.
Usage
Use this script when:
- You need to prepare training data for KTO alignment
- You want to convert Anthropic HH paired preferences into binary feedback format
- You are setting up the KTO training pipeline in NeMo Aligner
Code Reference
Source Location
- Repository: NVIDIA_NeMo_Aligner
- File:
examples/nlp/data/kto/preprocess_anthropichh_data.py - Lines: 1-123
Signature
process_hh:
def process_hh(split):
save_dataset_for_kto:
def save_dataset_for_kto(list_of_dicts, split, save_dir):
prepare_args:
def prepare_args():
convert_list_of_dict_to_jsonl:
def convert_list_of_dict_to_jsonl(list_of_dict):
Import
from preprocess_anthropichh_data import process_hh, save_dataset_for_kto
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --output-dir | str |
No | Output directory for the generated JSONL files (default: "./") |
Outputs
| Name | Type | Description |
|---|---|---|
| train.jsonl | JSONL file | Training split with unpacked binary preference labels |
| test.jsonl | JSONL file | Test split with unpacked binary preference labels |
Each output line is a JSON object with the following structure:
{
"prompt": "Human:\nWhat is the meaning of life?\nAssistant:\n",
"response": "The meaning of life is a philosophical question...",
"preference": "chosen"
}
Usage Examples
# Command-line usage:
python preprocess_anthropichh_data.py --output-dir /data/kto_processed
# This produces:
# /data/kto_processed/train.jsonl
# /data/kto_processed/test.jsonl