Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Aligner Preprocess AnthropicHH Data

From Leeroopedia


Knowledge Sources
Domains KTO, Data Preprocessing, Preference Learning
Last Updated 2026-02-08 00:00 GMT

Overview

A script that downloads the Anthropic Helpful-Harmless dataset and converts it from paired preference format into the binary feedback format required by KTO (Kahneman-Tversky Optimization) training.

Description

preprocess_anthropichh_data.py processes the Anthropic HH-RLHF dataset for KTO training. The script performs:

  1. Dataset loading: Downloads the full Anthropic/hh-rlhf dataset from HuggingFace. If the validation split is requested, it falls back to the test split (since Anthropic HH has no validation set).
  2. Conversation parsing: Each conversation string is split on \n\nHuman: and \n\nAssistant: delimiters to extract the prompt body and response. The parsed text is formatted using simple templates: Human:\n{body}\nAssistant:\n{response}.
  3. Preference unpacking: Each paired comparison (chosen and rejected) is unpacked into two separate dictionaries:
    • {"prompt": "...", "response": "...", "preference": "chosen"}
    • {"prompt": "...", "response": "...", "preference": "rejected"}
    Pairs where the chosen and rejected prompts do not match are discarded.
  4. Output saving: Saves train.jsonl and test.jsonl to the specified output directory, with one JSON object per line.

Note that this script uses a simpler prompt format (Human:\n{body}\nAssistant:\n{response}) compared to the chat-template-based format used in the CAI preprocessing scripts.

Usage

Use this script when:

  • You need to prepare training data for KTO alignment
  • You want to convert Anthropic HH paired preferences into binary feedback format
  • You are setting up the KTO training pipeline in NeMo Aligner

Code Reference

Source Location

  • Repository: NVIDIA_NeMo_Aligner
  • File: examples/nlp/data/kto/preprocess_anthropichh_data.py
  • Lines: 1-123

Signature

process_hh:

def process_hh(split):

save_dataset_for_kto:

def save_dataset_for_kto(list_of_dicts, split, save_dir):

prepare_args:

def prepare_args():

convert_list_of_dict_to_jsonl:

def convert_list_of_dict_to_jsonl(list_of_dict):

Import

from preprocess_anthropichh_data import process_hh, save_dataset_for_kto

I/O Contract

Inputs

Name Type Required Description
--output-dir str No Output directory for the generated JSONL files (default: "./")

Outputs

Name Type Description
train.jsonl JSONL file Training split with unpacked binary preference labels
test.jsonl JSONL file Test split with unpacked binary preference labels

Each output line is a JSON object with the following structure:

{
  "prompt": "Human:\nWhat is the meaning of life?\nAssistant:\n",
  "response": "The meaning of life is a philosophical question...",
  "preference": "chosen"
}

Usage Examples

# Command-line usage:
python preprocess_anthropichh_data.py --output-dir /data/kto_processed

# This produces:
#   /data/kto_processed/train.jsonl
#   /data/kto_processed/test.jsonl

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment