Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Aligner Generate RL CAI Dataset

From Leeroopedia


Knowledge Sources
Domains Constitutional AI, RLAIF, Dataset Generation
Last Updated 2026-02-08 00:00 GMT

Overview

A script that generates RL-based Constitutional AI preference datasets by producing multiple candidate responses at varying temperatures and using an NGC-hosted judge model to select chosen/rejected pairs.

Description

generate_rl_cai_dataset.py implements the RLAIF (Reinforcement Learning from AI Feedback) variant of Constitutional AI dataset generation. The pipeline consists of three main stages:

  1. Candidate generation: The script reads adversarial red-teaming prompts and generates multiple candidate responses using a local NeMo Megatron inference service at different temperatures (by default, ranging from 0.01 to 2.01 in 0.5 increments). This produces diverse responses spanning from near-deterministic to highly stochastic.
  2. AI preference labeling: Each set of candidate responses is sent to an NGC-hosted judge model (e.g., Mixtral-8x7B) along with the full constitution as a system prompt. The judge model selects the most harmless response as "chosen" and the most harmful as "rejected", outputting a structured JSON decision.
  3. Dataset splitting and blending: The preference dataset is split into configurable train/test splits, formatted with chat prompt templates, and optionally blended with external preference datasets.

The script deviates from the original CAI paper by feeding the entire constitution at once (rather than one principle at a time) and using direct selection (rather than normalized logprobs).

Usage

Use this script when:

  • You need to generate RLAIF preference datasets for reward model training
  • You have a local NeMo inference service running for candidate generation
  • You have an NGC API key for the judge model
  • You want to create chosen/rejected pairs from red-teaming prompts

Code Reference

Source Location

Signature

generate_cai_rlaif_candidate_dataset:

def generate_cai_rlaif_candidate_dataset(
    batch_size: int,
    temperatures: Union[List, int],
    red_teaming_dataset_path: str,
    inference_config: dict,
    prompt_template_config: dict,
):

generate_ai_preference:

def generate_ai_preference(
    sample: dict,
    ngc_api_key: str,
    system_prompt: str,
    seed: int,
    ngc_url: str,
    ngc_model: str,
):

main:

def main():

Import

from generate_rl_cai_dataset import generate_cai_rlaif_candidate_dataset, generate_ai_preference

I/O Contract

Inputs

Name Type Required Description
--batch-size int Yes Inference batch size for candidate generation
--red-teaming-file-path str Yes Path to Anthropic red-teaming prompt dataset (JSONL)
--sys-prompt-constitution-file-path str Yes Path to constitution text file used as system prompt for the judge
--ngc-api-key str Yes NGC API key for the judge model
--ngc-url str No NGC API endpoint URL (default: https://integrate.api.nvidia.com/v1/chat/completions)
--ngc-model str No NGC model name (default: mistralai/mixtral-8x7b-instruct-v0.1)
--output-dir str No Output directory (default: ".")
--output-filename-prefix str No Prefix for output filenames (default: "cai_rlaif")
--splits str No Dataset split ratios as dict string (default: "{'train': 0.8, 'test': 0.2}")
--shuffle str No Whether to shuffle the dataset (default: "True")
--seed int No Random seed (default: 1234)
--blend-with str No External dataset blending configuration as dict string
--host str No Inference service hostname (default: "localhost")
--port int No Inference service port (default: 5656)
--temperature float No Base sampling temperature (default: 1.0)
--tokens_to_generate int No Max tokens to generate (default: 1024)

Outputs

Name Type Description
cai_candidate_dataset.json JSON file Raw candidate responses at multiple temperatures
cai_preference_dataset.json JSON file AI-labeled preference pairs (chosen/rejected with metadata)
{prefix}_{split}_prompts_with_chat_prompt.jsonl JSONL file Formatted prompts for each split
{prefix}_{split}_comparisons_with_chat_prompt.jsonl JSONL file Formatted chosen/rejected comparisons for each split
blend_*.jsonl JSONL file Blended datasets (if --blend-with is specified)

Usage Examples

# Command-line usage:
python generate_rl_cai_dataset.py \
    --batch-size 128 \
    --red-teaming-file-path /data/red_team_attempts.jsonl \
    --sys-prompt-constitution-file-path /data/constitution.txt \
    --ngc-api-key "your-ngc-api-key" \
    --output-dir /output/rlaif \
    --output-filename-prefix cai_rlaif \
    --splits "{'train': 0.8, 'test': 0.2}" \
    --user_format "<extra_id_1>User\n{MESSAGE}\n<extra_id_1>Assistant\n" \
    --assistant_format "{MESSAGE}\n" \
    --system_format "<extra_id_0>System\n{MESSAGE}\n" \
    --system_default_message "" \
    --eos_token "<extra_id_1>" \
    --response_extract_pattern "<extra_id_1>Assistant\n"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment