Implementation:NVIDIA NeMo Aligner Generate RL CAI Dataset
| Knowledge Sources | |
|---|---|
| Domains | Constitutional AI, RLAIF, Dataset Generation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A script that generates RL-based Constitutional AI preference datasets by producing multiple candidate responses at varying temperatures and using an NGC-hosted judge model to select chosen/rejected pairs.
Description
generate_rl_cai_dataset.py implements the RLAIF (Reinforcement Learning from AI Feedback) variant of Constitutional AI dataset generation. The pipeline consists of three main stages:
- Candidate generation: The script reads adversarial red-teaming prompts and generates multiple candidate responses using a local NeMo Megatron inference service at different temperatures (by default, ranging from 0.01 to 2.01 in 0.5 increments). This produces diverse responses spanning from near-deterministic to highly stochastic.
- AI preference labeling: Each set of candidate responses is sent to an NGC-hosted judge model (e.g., Mixtral-8x7B) along with the full constitution as a system prompt. The judge model selects the most harmless response as "chosen" and the most harmful as "rejected", outputting a structured JSON decision.
- Dataset splitting and blending: The preference dataset is split into configurable train/test splits, formatted with chat prompt templates, and optionally blended with external preference datasets.
The script deviates from the original CAI paper by feeding the entire constitution at once (rather than one principle at a time) and using direct selection (rather than normalized logprobs).
Usage
Use this script when:
- You need to generate RLAIF preference datasets for reward model training
- You have a local NeMo inference service running for candidate generation
- You have an NGC API key for the judge model
- You want to create chosen/rejected pairs from red-teaming prompts
Code Reference
Source Location
- Repository: NVIDIA_NeMo_Aligner
- File:
examples/nlp/cai/generate_rl_cai_dataset.py - Lines: 1-693
Signature
generate_cai_rlaif_candidate_dataset:
def generate_cai_rlaif_candidate_dataset(
batch_size: int,
temperatures: Union[List, int],
red_teaming_dataset_path: str,
inference_config: dict,
prompt_template_config: dict,
):
generate_ai_preference:
def generate_ai_preference(
sample: dict,
ngc_api_key: str,
system_prompt: str,
seed: int,
ngc_url: str,
ngc_model: str,
):
main:
def main():
Import
from generate_rl_cai_dataset import generate_cai_rlaif_candidate_dataset, generate_ai_preference
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --batch-size | int |
Yes | Inference batch size for candidate generation |
| --red-teaming-file-path | str |
Yes | Path to Anthropic red-teaming prompt dataset (JSONL) |
| --sys-prompt-constitution-file-path | str |
Yes | Path to constitution text file used as system prompt for the judge |
| --ngc-api-key | str |
Yes | NGC API key for the judge model |
| --ngc-url | str |
No | NGC API endpoint URL (default: https://integrate.api.nvidia.com/v1/chat/completions) |
| --ngc-model | str |
No | NGC model name (default: mistralai/mixtral-8x7b-instruct-v0.1) |
| --output-dir | str |
No | Output directory (default: ".") |
| --output-filename-prefix | str |
No | Prefix for output filenames (default: "cai_rlaif") |
| --splits | str |
No | Dataset split ratios as dict string (default: "{'train': 0.8, 'test': 0.2}") |
| --shuffle | str |
No | Whether to shuffle the dataset (default: "True") |
| --seed | int |
No | Random seed (default: 1234) |
| --blend-with | str |
No | External dataset blending configuration as dict string |
| --host | str |
No | Inference service hostname (default: "localhost") |
| --port | int |
No | Inference service port (default: 5656) |
| --temperature | float |
No | Base sampling temperature (default: 1.0) |
| --tokens_to_generate | int |
No | Max tokens to generate (default: 1024) |
Outputs
| Name | Type | Description |
|---|---|---|
| cai_candidate_dataset.json | JSON file | Raw candidate responses at multiple temperatures |
| cai_preference_dataset.json | JSON file | AI-labeled preference pairs (chosen/rejected with metadata) |
| {prefix}_{split}_prompts_with_chat_prompt.jsonl | JSONL file | Formatted prompts for each split |
| {prefix}_{split}_comparisons_with_chat_prompt.jsonl | JSONL file | Formatted chosen/rejected comparisons for each split |
| blend_*.jsonl | JSONL file | Blended datasets (if --blend-with is specified) |
Usage Examples
# Command-line usage:
python generate_rl_cai_dataset.py \
--batch-size 128 \
--red-teaming-file-path /data/red_team_attempts.jsonl \
--sys-prompt-constitution-file-path /data/constitution.txt \
--ngc-api-key "your-ngc-api-key" \
--output-dir /output/rlaif \
--output-filename-prefix cai_rlaif \
--splits "{'train': 0.8, 'test': 0.2}" \
--user_format "<extra_id_1>User\n{MESSAGE}\n<extra_id_1>Assistant\n" \
--assistant_format "{MESSAGE}\n" \
--system_format "<extra_id_0>System\n{MESSAGE}\n" \
--system_default_message "" \
--eos_token "<extra_id_1>" \
--response_extract_pattern "<extra_id_1>Assistant\n"