Implementation:CarperAI Trlx NeMo Scaling Benchmark
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Reinforcement_Learning, Megatron |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
Concrete tool for benchmarking PPO training throughput across multiple model scales (1.3B to 66B) using trlx with NeMo's Megatron backend.
Description
The nemo_vs_ds_chat script runs PPO training on the Dahoas/rm-static dataset using NeMo as the trainer backend. It supports six model sizes (1.3B, 6.7B, 13B, 20B, 33B, 66B) selectable via the NEMO_CONFIG environment variable, each with pre-tuned batch sizes, mini-batch sizes, chunk sizes, and unfrozen layer counts. A dummy reward function (always returns 0.5) is used to benchmark throughput rather than training quality. Configures SLURM-based distributed training with automatic rank/device detection.
Usage
Use this script to benchmark NeMo PPO training throughput at different model scales. Set the NEMO_CONFIG environment variable to select the model size before launching.
Code Reference
Source Location
- Repository: CarperAI_Trlx
- File: examples/nemo_vs_ds_chat.py
- Lines: 1-202
Signature
def main(hparams: dict = {}) -> None:
"""
Run PPO benchmark with NeMo at a specified model scale.
Reads NEMO_CONFIG env var to select model size:
"1.3B", "6.7B", "13B", "20B", "33B", or "66B".
"""
def reward_fn(samples: List[str], **kwargs) -> List[float]:
"""Dummy reward function returning 0.5 for all samples."""
Import
# CLI usage:
# NEMO_CONFIG=1.3B python examples/nemo_vs_ds_chat.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| NEMO_CONFIG | env var | No | Model size: "1.3B", "6.7B", "13B", "20B", "33B", "66B" (default "1.3B") |
| SLURM_PROCID | env var | No | SLURM process rank for multi-node setup |
| hparams | dict | No | Override hyperparameters (for Ray Tune integration) |
Outputs
| Name | Type | Description |
|---|---|---|
| Trained model | NeMoPPOTrainer | PPO-trained model with throughput metrics |
| W&B logs | Metrics | Training metrics logged to Weights & Biases |
Usage Examples
Benchmark 1.3B Model
# Single-node benchmark
NEMO_CONFIG=1.3B python examples/nemo_vs_ds_chat.py
# Multi-node via SLURM
NEMO_CONFIG=20B srun --nodes=4 --gpus-per-node=8 python examples/nemo_vs_ds_chat.py