Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:CarperAI Trlx NeMo Scaling Benchmark

From Leeroopedia


Knowledge Sources
Domains Benchmarking, Reinforcement_Learning, Megatron
Last Updated 2026-02-07 16:00 GMT

Overview

Concrete tool for benchmarking PPO training throughput across multiple model scales (1.3B to 66B) using trlx with NeMo's Megatron backend.

Description

The nemo_vs_ds_chat script runs PPO training on the Dahoas/rm-static dataset using NeMo as the trainer backend. It supports six model sizes (1.3B, 6.7B, 13B, 20B, 33B, 66B) selectable via the NEMO_CONFIG environment variable, each with pre-tuned batch sizes, mini-batch sizes, chunk sizes, and unfrozen layer counts. A dummy reward function (always returns 0.5) is used to benchmark throughput rather than training quality. Configures SLURM-based distributed training with automatic rank/device detection.

Usage

Use this script to benchmark NeMo PPO training throughput at different model scales. Set the NEMO_CONFIG environment variable to select the model size before launching.

Code Reference

Source Location

Signature

def main(hparams: dict = {}) -> None:
    """
    Run PPO benchmark with NeMo at a specified model scale.

    Reads NEMO_CONFIG env var to select model size:
    "1.3B", "6.7B", "13B", "20B", "33B", or "66B".
    """

def reward_fn(samples: List[str], **kwargs) -> List[float]:
    """Dummy reward function returning 0.5 for all samples."""

Import

# CLI usage:
# NEMO_CONFIG=1.3B python examples/nemo_vs_ds_chat.py

I/O Contract

Inputs

Name Type Required Description
NEMO_CONFIG env var No Model size: "1.3B", "6.7B", "13B", "20B", "33B", "66B" (default "1.3B")
SLURM_PROCID env var No SLURM process rank for multi-node setup
hparams dict No Override hyperparameters (for Ray Tune integration)

Outputs

Name Type Description
Trained model NeMoPPOTrainer PPO-trained model with throughput metrics
W&B logs Metrics Training metrics logged to Weights & Biases

Usage Examples

Benchmark 1.3B Model

# Single-node benchmark
NEMO_CONFIG=1.3B python examples/nemo_vs_ds_chat.py

# Multi-node via SLURM
NEMO_CONFIG=20B srun --nodes=4 --gpus-per-node=8 python examples/nemo_vs_ds_chat.py

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment