Implementation:NVIDIA NeMo Aligner Generate RL CAI Dataset

Knowledge Sources	NVIDIA_NeMo_Aligner
Domains	Constitutional AI, RLAIF, Dataset Generation
Last Updated	2026-02-08 00:00 GMT

Overview

A script that generates RL-based Constitutional AI preference datasets by producing multiple candidate responses at varying temperatures and using an NGC-hosted judge model to select chosen/rejected pairs.

Description

generate_rl_cai_dataset.py implements the RLAIF (Reinforcement Learning from AI Feedback) variant of Constitutional AI dataset generation. The pipeline consists of three main stages:

Candidate generation: The script reads adversarial red-teaming prompts and generates multiple candidate responses using a local NeMo Megatron inference service at different temperatures (by default, ranging from 0.01 to 2.01 in 0.5 increments). This produces diverse responses spanning from near-deterministic to highly stochastic.
AI preference labeling: Each set of candidate responses is sent to an NGC-hosted judge model (e.g., Mixtral-8x7B) along with the full constitution as a system prompt. The judge model selects the most harmless response as "chosen" and the most harmful as "rejected", outputting a structured JSON decision.
Dataset splitting and blending: The preference dataset is split into configurable train/test splits, formatted with chat prompt templates, and optionally blended with external preference datasets.

The script deviates from the original CAI paper by feeding the entire constitution at once (rather than one principle at a time) and using direct selection (rather than normalized logprobs).

Usage

Use this script when:

You need to generate RLAIF preference datasets for reward model training
You have a local NeMo inference service running for candidate generation
You have an NGC API key for the judge model
You want to create chosen/rejected pairs from red-teaming prompts

Code Reference

Source Location

Repository: NVIDIA_NeMo_Aligner
File: examples/nlp/cai/generate_rl_cai_dataset.py
Lines: 1-693

Signature

generate_cai_rlaif_candidate_dataset:

def generate_cai_rlaif_candidate_dataset(
    batch_size: int,
    temperatures: Union[List, int],
    red_teaming_dataset_path: str,
    inference_config: dict,
    prompt_template_config: dict,
):

generate_ai_preference:

def generate_ai_preference(
    sample: dict,
    ngc_api_key: str,
    system_prompt: str,
    seed: int,
    ngc_url: str,
    ngc_model: str,
):

main:

def main():

Import

from generate_rl_cai_dataset import generate_cai_rlaif_candidate_dataset, generate_ai_preference

I/O Contract

Inputs

Name	Type	Required	Description
--batch-size	`int`	Yes	Inference batch size for candidate generation
--red-teaming-file-path	`str`	Yes	Path to Anthropic red-teaming prompt dataset (JSONL)
--sys-prompt-constitution-file-path	`str`	Yes	Path to constitution text file used as system prompt for the judge
--ngc-api-key	`str`	Yes	NGC API key for the judge model
--ngc-url	`str`	No	NGC API endpoint URL (default: https://integrate.api.nvidia.com/v1/chat/completions)
--ngc-model	`str`	No	NGC model name (default: mistralai/mixtral-8x7b-instruct-v0.1)
--output-dir	`str`	No	Output directory (default: ".")
--output-filename-prefix	`str`	No	Prefix for output filenames (default: "cai_rlaif")
--splits	`str`	No	Dataset split ratios as dict string (default: "{'train': 0.8, 'test': 0.2}")
--shuffle	`str`	No	Whether to shuffle the dataset (default: "True")
--seed	`int`	No	Random seed (default: 1234)
--blend-with	`str`	No	External dataset blending configuration as dict string
--host	`str`	No	Inference service hostname (default: "localhost")
--port	`int`	No	Inference service port (default: 5656)
--temperature	`float`	No	Base sampling temperature (default: 1.0)
--tokens_to_generate	`int`	No	Max tokens to generate (default: 1024)

Outputs

Name	Type	Description
cai_candidate_dataset.json	JSON file	Raw candidate responses at multiple temperatures
cai_preference_dataset.json	JSON file	AI-labeled preference pairs (chosen/rejected with metadata)
{prefix}_{split}_prompts_with_chat_prompt.jsonl	JSONL file	Formatted prompts for each split
{prefix}_{split}_comparisons_with_chat_prompt.jsonl	JSONL file	Formatted chosen/rejected comparisons for each split
blend_*.jsonl	JSONL file	Blended datasets (if --blend-with is specified)

Usage Examples

# Command-line usage:
python generate_rl_cai_dataset.py \
    --batch-size 128 \
    --red-teaming-file-path /data/red_team_attempts.jsonl \
    --sys-prompt-constitution-file-path /data/constitution.txt \
    --ngc-api-key "your-ngc-api-key" \
    --output-dir /output/rlaif \
    --output-filename-prefix cai_rlaif \
    --splits "{'train': 0.8, 'test': 0.2}" \
    --user_format "<extra_id_1>User\n{MESSAGE}\n<extra_id_1>Assistant\n" \
    --assistant_format "{MESSAGE}\n" \
    --system_format "<extra_id_0>System\n{MESSAGE}\n" \
    --system_default_message "" \
    --eos_token "<extra_id_1>" \
    --response_extract_pattern "<extra_id_1>Assistant\n"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment