Implementation:CarperAI Trlx NeMo Scaling Benchmark

Knowledge Sources	CarperAI_Trlx
Domains	Benchmarking, Reinforcement_Learning, Megatron
Last Updated	2026-02-07 16:00 GMT

Overview

Concrete tool for benchmarking PPO training throughput across multiple model scales (1.3B to 66B) using trlx with NeMo's Megatron backend.

Description

The nemo_vs_ds_chat script runs PPO training on the Dahoas/rm-static dataset using NeMo as the trainer backend. It supports six model sizes (1.3B, 6.7B, 13B, 20B, 33B, 66B) selectable via the NEMO_CONFIG environment variable, each with pre-tuned batch sizes, mini-batch sizes, chunk sizes, and unfrozen layer counts. A dummy reward function (always returns 0.5) is used to benchmark throughput rather than training quality. Configures SLURM-based distributed training with automatic rank/device detection.

Usage

Use this script to benchmark NeMo PPO training throughput at different model scales. Set the NEMO_CONFIG environment variable to select the model size before launching.

Code Reference

Source Location

Repository: CarperAI_Trlx
File: examples/nemo_vs_ds_chat.py
Lines: 1-202

Signature

def main(hparams: dict = {}) -> None:
    """
    Run PPO benchmark with NeMo at a specified model scale.

    Reads NEMO_CONFIG env var to select model size:
    "1.3B", "6.7B", "13B", "20B", "33B", or "66B".
    """

def reward_fn(samples: List[str], **kwargs) -> List[float]:
    """Dummy reward function returning 0.5 for all samples."""

Import

# CLI usage:
# NEMO_CONFIG=1.3B python examples/nemo_vs_ds_chat.py

I/O Contract

Inputs

Name	Type	Required	Description
NEMO_CONFIG	env var	No	Model size: "1.3B", "6.7B", "13B", "20B", "33B", "66B" (default "1.3B")
SLURM_PROCID	env var	No	SLURM process rank for multi-node setup
hparams	dict	No	Override hyperparameters (for Ray Tune integration)

Outputs

Name	Type	Description
Trained model	NeMoPPOTrainer	PPO-trained model with throughput metrics
W&B logs	Metrics	Training metrics logged to Weights & Biases

Usage Examples

Benchmark 1.3B Model

# Single-node benchmark
NEMO_CONFIG=1.3B python examples/nemo_vs_ds_chat.py

# Multi-node via SLURM
NEMO_CONFIG=20B srun --nodes=4 --gpus-per-node=8 python examples/nemo_vs_ds_chat.py

Related Pages

Environment:CarperAI_Trlx_NeMo_Megatron

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment