Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA NeMo Aligner Build RLHF Datasets

From Leeroopedia


Implementation Details
Name Build_RLHF_Datasets
Type API Doc
Implements RLHF_Prompt_Data_Preparation
Repository NeMo Aligner
Primary File nemo_aligner/data/nlp/builders.py
Domains NLP, Data_Engineering
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for constructing prompt-only datasets for RLHF training provided by the NeMo Aligner data builders module.

Description

The build_train_valid_test_rlhf_datasets function is a partial application of the generic build_train_valid_test_datasets factory, specialized to create RLHFDataset instances. The RLHFDataset class handles tokenization and padding of prompt-only JSONL data for PPO and REINFORCE training. Unlike SFT or DPO datasets, this dataset returns only prompts -- responses are generated online by the actor model during rollouts.

Usage

Import when setting up PPO or REINFORCE training data. The same builder and dataset class is shared between both PPO and REINFORCE workflows.

Code Reference

Source Location

  • Repository: NeMo Aligner
  • File: nemo_aligner/data/nlp/builders.py (L392 partial), nemo_aligner/data/nlp/datasets.py (RLHFDataset)

Signature

# Partial application in builders.py
build_train_valid_test_rlhf_datasets = partial(build_train_valid_test_datasets, RLHFDataset)

# RLHFDataset returns tokenized prompts:
class RLHFDataset(Dataset):
    def __getitem__(self, idx) -> dict:
        # Returns: {"text": Tensor, "length": int} (prompt tokens only)

Import

from nemo_aligner.data.nlp.builders import build_train_valid_test_rlhf_datasets

I/O Contract

Inputs

Name Type Required Description
cfg DictConfig Yes Data config with paths, sequence length
data_prefix str Yes Path to prompt JSONL files
data_impl str Yes Data format (jsonl)
splits_string str Yes Train/val/test split ratios
train_valid_test_num_samples list Yes Number of samples per split
seq_length int Yes Maximum prompt length
seed int Yes Random seed
tokenizer TokenizerSpec Yes Model tokenizer

Outputs

Name Type Description
train_ds RLHFDataset Training prompt dataset
val_ds RLHFDataset Validation prompt dataset
test_ds RLHFDataset Test prompt dataset

Usage Examples

from nemo_aligner.data.nlp.builders import build_train_valid_test_rlhf_datasets

train_ds, val_ds, test_ds = build_train_valid_test_rlhf_datasets(
    cfg=cfg.model.data,
    data_prefix=cfg.model.data.data_prefix,
    data_impl="jsonl",
    splits_string="980,10,10",
    train_valid_test_num_samples=[50000, 1000, 1000],
    seq_length=cfg.model.data.seq_length,
    seed=cfg.model.seed,
    tokenizer=model.tokenizer,
)

Related Pages

Knowledge Sources

NLP | Data_Engineering

2026-02-07 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment