Implementation:NVIDIA NeMo Aligner Build RLHF Datasets
| Implementation Details | |
|---|---|
| Name | Build_RLHF_Datasets |
| Type | API Doc |
| Implements | RLHF_Prompt_Data_Preparation |
| Repository | NeMo Aligner |
| Primary File | nemo_aligner/data/nlp/builders.py |
| Domains | NLP, Data_Engineering |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for constructing prompt-only datasets for RLHF training provided by the NeMo Aligner data builders module.
Description
The build_train_valid_test_rlhf_datasets function is a partial application of the generic build_train_valid_test_datasets factory, specialized to create RLHFDataset instances. The RLHFDataset class handles tokenization and padding of prompt-only JSONL data for PPO and REINFORCE training. Unlike SFT or DPO datasets, this dataset returns only prompts -- responses are generated online by the actor model during rollouts.
Usage
Import when setting up PPO or REINFORCE training data. The same builder and dataset class is shared between both PPO and REINFORCE workflows.
Code Reference
Source Location
- Repository: NeMo Aligner
- File:
nemo_aligner/data/nlp/builders.py(L392 partial),nemo_aligner/data/nlp/datasets.py(RLHFDataset)
Signature
# Partial application in builders.py
build_train_valid_test_rlhf_datasets = partial(build_train_valid_test_datasets, RLHFDataset)
# RLHFDataset returns tokenized prompts:
class RLHFDataset(Dataset):
def __getitem__(self, idx) -> dict:
# Returns: {"text": Tensor, "length": int} (prompt tokens only)
Import
from nemo_aligner.data.nlp.builders import build_train_valid_test_rlhf_datasets
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
cfg |
DictConfig |
Yes | Data config with paths, sequence length |
data_prefix |
str |
Yes | Path to prompt JSONL files |
data_impl |
str |
Yes | Data format (jsonl) |
splits_string |
str |
Yes | Train/val/test split ratios |
train_valid_test_num_samples |
list |
Yes | Number of samples per split |
seq_length |
int |
Yes | Maximum prompt length |
seed |
int |
Yes | Random seed |
tokenizer |
TokenizerSpec |
Yes | Model tokenizer |
Outputs
| Name | Type | Description |
|---|---|---|
train_ds |
RLHFDataset |
Training prompt dataset |
val_ds |
RLHFDataset |
Validation prompt dataset |
test_ds |
RLHFDataset |
Test prompt dataset |
Usage Examples
from nemo_aligner.data.nlp.builders import build_train_valid_test_rlhf_datasets
train_ds, val_ds, test_ds = build_train_valid_test_rlhf_datasets(
cfg=cfg.model.data,
data_prefix=cfg.model.data.data_prefix,
data_impl="jsonl",
splits_string="980,10,10",
train_valid_test_num_samples=[50000, 1000, 1000],
seq_length=cfg.model.data.seq_length,
seed=cfg.model.seed,
tokenizer=model.tokenizer,
)
Related Pages
- Principle:NVIDIA_NeMo_Aligner_RLHF_Prompt_Data_Preparation
- Environment:NVIDIA_NeMo_Aligner_NeMo_Framework_GPU_Environment