Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Aligner Preprocess HelpSteer2 Data

From Leeroopedia
Revision as of 15:56, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/NVIDIA_NeMo_Aligner_Preprocess_HelpSteer2_Data.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains SteerLM, Data Preprocessing, HelpSteer
Last Updated 2026-02-08 00:00 GMT

Overview

A script that downloads the NVIDIA HelpSteer2 dataset from HuggingFace and converts it into the attribute-conditioned SFT training format, with support for both standard and preference variants.

Description

preprocess_helpsteer2_data.py is an enhanced version of the HelpSteer preprocessing script that supports the HelpSteer2 dataset and its preference variant. Key differences from the HelpSteer preprocessor:

  1. Standard mode: Downloads nvidia/HelpSteer2 and processes it identically to HelpSteer, using the full set of 9 SteerLM attributes (quality, toxicity, humor, creativity, helpfulness, correctness, coherence, complexity, verbosity). Only attributes present in the dataset are included in the label string.
  2. Preference mode (--preference): Downloads the nvidia/HelpSteer2 preference data directory, which contains paired responses (response_1, response_2) with a preference_strength score and an explicit train/validation split field. Each response is converted into a separate training sample with only a quality attribute set to the preference strength value.
  3. Helpfulness-only mode (--only_helpfulness): When enabled, filters labels to include only the helpfulness attribute, useful for simplified reward model training.

The output format is identical to the HelpSteer preprocessor: conversation objects with system prompt, User/Assistant turns, attribute labels, mask, and type fields.

Usage

Use this script when:

  • You need to prepare HelpSteer2 data for SteerLM training
  • You want to train a Bradley-Terry reward model using HelpSteer2 preference data
  • You need helpfulness-only labels for simplified reward modeling

Code Reference

Source Location

  • Repository: NVIDIA_NeMo_Aligner
  • File: examples/nlp/data/steerlm/preprocess_helpsteer2_data.py
  • Lines: 1-124

Signature

download_helpsteer2:

def download_helpsteer2():

download_helpsteer2_preference:

def download_helpsteer2_preference():

format_label:

def format_label(dp, only_helpfulness=False):

process_dataset:

def process_dataset(data, only_helpfulness=False):

main:

def main(output_dir, preference=False, only_helpfulness=False):

Import

from preprocess_helpsteer2_data import download_helpsteer2, download_helpsteer2_preference, process_dataset

I/O Contract

Inputs

Name Type Required Description
-dir / --output_directory str Yes Output folder for train.jsonl and val.jsonl; created if it does not exist
-oh / --only_helpfulness flag No When set, uses only the helpfulness attribute in labels
-pref / --preference flag No When set, uses HelpSteer2-preference data for Bradley-Terry reward modeling instead of regular HelpSteer2

Outputs

Name Type Description
train.jsonl JSONL file Training split in attribute-conditioned SFT format with SteerLM attribute labels
val.jsonl JSONL file Validation split in attribute-conditioned SFT format with SteerLM attribute labels

Usage Examples

# Standard HelpSteer2 preprocessing:
python preprocess_helpsteer2_data.py --output_directory /data/helpsteer2_processed

# HelpSteer2 with helpfulness-only labels:
python preprocess_helpsteer2_data.py --output_directory /data/helpsteer2_helpfulness --only_helpfulness

# HelpSteer2 preference data for Bradley-Terry reward modeling:
python preprocess_helpsteer2_data.py --output_directory /data/helpsteer2_preference --preference

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment