Environment:CarperAI Trlx NeMo Megatron

Knowledge Sources	CarperAI/trlx NVIDIA NeMo NVIDIA Apex
Domains	Infrastructure, Distributed_Training, NLP
Last Updated	2026-02-07 18:00 GMT

Overview

NVIDIA NeMo Toolkit r1.15.0 with Megatron-LM and Apex for large-scale model-parallel RLHF training on multi-node GPU clusters.

Description

This environment provides the runtime context for NeMo-based trainers and models in trlx. It extends the base Python/Accelerate environment with NVIDIA's NeMo Toolkit for Megatron-style model parallelism (tensor parallel, pipeline parallel), NVIDIA Apex for fused kernels and mixed-precision training, and the Megatron batch sampler for distributed data loading. NeMo trainers are registered as optional backends; if NeMo is not installed, they raise ImportError at registration time.

Usage

Use this environment when running NeMo-based PPO, ILQL, or SFT training at scale (1.3B to 65B+ parameters) with tensor and pipeline parallelism. It is required for all NeMo model variants (NeMoPPOModel, NeMoILQLModel, NeMoSFTModel) and their corresponding trainers. This is a separate environment from the Accelerate-based stack and typically runs on HPC clusters with SLURM.

System Requirements

Category	Requirement	Notes
OS	Linux	HPC cluster with SLURM recommended
Python	3.9 - 3.10	NeMo r1.15.0 compatibility
Hardware	Multi-GPU NVIDIA (A100/H100 recommended)	Tensor parallelism requires NVLink
CUDA	11.7 - 11.8	Required by Apex and NeMo
Disk	100GB+	Large model checkpoints in .nemo format

Dependencies

System Packages

CUDA Toolkit 11.7 or 11.8
NVIDIA drivers compatible with CUDA 11.x
NCCL for multi-node communication

Python Packages (Core)

`nemo_toolkit[all]` == r1.15.0 (NVIDIA NeMo)
`apex` (NVIDIA Apex with CUDA extensions)
`torch` >= 1.13.0 (with CUDA)
`transformers` >= 4.27.1
`einops` >= 0.4.1
`wandb` >= 0.13.5
`omegaconf` (NeMo configuration)
`pytorch-lightning` (NeMo training backend)

Build from Source (Required)

Apex must be built from source with CUDA extensions:

git clone https://github.com/NVIDIA/apex/
cd apex
pip install -v --disable-pip-version-check --no-cache-dir \
  --global-option="--cpp_ext" \
  --global-option="--cuda_ext" \
  --global-option="--fast_layer_norm" \
  --global-option="--distributed_adam" \
  --global-option="--deprecated_fused_adam" ./

NeMo must be installed from source at the correct version:

git clone https://github.com/NVIDIA/NeMo/
cd NeMo
git checkout r1.15.0
pip install '.[all]'

Credentials

`WANDB_API_KEY`: Weights & Biases API key for experiment tracking
`HF_TOKEN`: HuggingFace API token for accessing gated models and checkpoints

Distributed Training Variables

`WORLD_SIZE`: Total number of processes (set by SLURM/torchrun)
`LOCAL_RANK`: Local GPU rank (auto-set)
`RANK`: Global process rank (auto-set)

Quick Install

# 1. Create conda environment
conda env create -f env.yaml

# 2. Install NeMo r1.15.0
git clone https://github.com/NVIDIA/NeMo/ && cd NeMo
git checkout r1.15.0
pip install '.[all]'

# 3. Build Apex from source
git clone https://github.com/NVIDIA/apex/ && cd apex
pip install -v --disable-pip-version-check --no-cache-dir \
  --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--fast_layer_norm" --global-option="--distributed_adam" \
  --global-option="--deprecated_fused_adam" ./

# 4. Install trlx
pip install git+https://github.com/CarperAI/trlx.git

Code Evidence

NeMo optional import handling from `trlx/utils/loading.py:14-28`:

try:
    from trlx.trainer.nemo_ilql_trainer import NeMoILQLTrainer
    from trlx.trainer.nemo_ppo_trainer import NeMoPPOTrainer
    from trlx.trainer.nemo_sft_trainer import NeMoSFTTrainer
except ImportError:
    def _trainers_unavailble(names: List[str]):
        def log_error(*args, **kwargs):
            raise ImportError("NeMo is not installed.")
        for name in names:
            register_trainer(name)(log_error)

NeMo model imports from `trlx/models/modeling_nemo_ppo.py:13-30`:

from apex.transformer import parallel_state, tensor_parallel
from apex.transformer.pipeline_parallel.utils import _reconfigure_microbatch_calculator
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
from nemo.collections.nlp.models.language_modeling.megatron_base_model import MegatronBaseModel

Common Errors

Error Message	Cause	Solution
`ImportError: NeMo is not installed`	NeMo toolkit not present	Install NeMo r1.15.0 from source (see Quick Install)
`ModuleNotFoundError: apex`	NVIDIA Apex not built	Build Apex from source with CUDA extensions
`megatron_legacy` config warning	Older NeMo checkpoint format	Set `megatron_legacy: True` in model config

Compatibility Notes

Version pinned: Only NeMo `r1.15.0` is supported. Later versions may have breaking API changes.
Separate environment: NeMo and Apex have strict version requirements that may conflict with the base Accelerate environment. A dedicated conda/virtual environment is recommended.
Pretrained models: NeMo `.nemo` checkpoints must be un-tarred before use. Set `train.trainer_kwargs.pretrained_model` to the extracted directory path.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment