Environment:CarperAI Trlx NeMo Megatron
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Distributed_Training, NLP |
| Last Updated | 2026-02-07 18:00 GMT |
Overview
NVIDIA NeMo Toolkit r1.15.0 with Megatron-LM and Apex for large-scale model-parallel RLHF training on multi-node GPU clusters.
Description
This environment provides the runtime context for NeMo-based trainers and models in trlx. It extends the base Python/Accelerate environment with NVIDIA's NeMo Toolkit for Megatron-style model parallelism (tensor parallel, pipeline parallel), NVIDIA Apex for fused kernels and mixed-precision training, and the Megatron batch sampler for distributed data loading. NeMo trainers are registered as optional backends; if NeMo is not installed, they raise ImportError at registration time.
Usage
Use this environment when running NeMo-based PPO, ILQL, or SFT training at scale (1.3B to 65B+ parameters) with tensor and pipeline parallelism. It is required for all NeMo model variants (NeMoPPOModel, NeMoILQLModel, NeMoSFTModel) and their corresponding trainers. This is a separate environment from the Accelerate-based stack and typically runs on HPC clusters with SLURM.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | HPC cluster with SLURM recommended |
| Python | 3.9 - 3.10 | NeMo r1.15.0 compatibility |
| Hardware | Multi-GPU NVIDIA (A100/H100 recommended) | Tensor parallelism requires NVLink |
| CUDA | 11.7 - 11.8 | Required by Apex and NeMo |
| Disk | 100GB+ | Large model checkpoints in .nemo format |
Dependencies
System Packages
- CUDA Toolkit 11.7 or 11.8
- NVIDIA drivers compatible with CUDA 11.x
- NCCL for multi-node communication
Python Packages (Core)
- `nemo_toolkit[all]` == r1.15.0 (NVIDIA NeMo)
- `apex` (NVIDIA Apex with CUDA extensions)
- `torch` >= 1.13.0 (with CUDA)
- `transformers` >= 4.27.1
- `einops` >= 0.4.1
- `wandb` >= 0.13.5
- `omegaconf` (NeMo configuration)
- `pytorch-lightning` (NeMo training backend)
Build from Source (Required)
Apex must be built from source with CUDA extensions:
git clone https://github.com/NVIDIA/apex/
cd apex
pip install -v --disable-pip-version-check --no-cache-dir \
--global-option="--cpp_ext" \
--global-option="--cuda_ext" \
--global-option="--fast_layer_norm" \
--global-option="--distributed_adam" \
--global-option="--deprecated_fused_adam" ./
NeMo must be installed from source at the correct version:
git clone https://github.com/NVIDIA/NeMo/
cd NeMo
git checkout r1.15.0
pip install '.[all]'
Credentials
- `WANDB_API_KEY`: Weights & Biases API key for experiment tracking
- `HF_TOKEN`: HuggingFace API token for accessing gated models and checkpoints
Distributed Training Variables
- `WORLD_SIZE`: Total number of processes (set by SLURM/torchrun)
- `LOCAL_RANK`: Local GPU rank (auto-set)
- `RANK`: Global process rank (auto-set)
Quick Install
# 1. Create conda environment
conda env create -f env.yaml
# 2. Install NeMo r1.15.0
git clone https://github.com/NVIDIA/NeMo/ && cd NeMo
git checkout r1.15.0
pip install '.[all]'
# 3. Build Apex from source
git clone https://github.com/NVIDIA/apex/ && cd apex
pip install -v --disable-pip-version-check --no-cache-dir \
--global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--fast_layer_norm" --global-option="--distributed_adam" \
--global-option="--deprecated_fused_adam" ./
# 4. Install trlx
pip install git+https://github.com/CarperAI/trlx.git
Code Evidence
NeMo optional import handling from `trlx/utils/loading.py:14-28`:
try:
from trlx.trainer.nemo_ilql_trainer import NeMoILQLTrainer
from trlx.trainer.nemo_ppo_trainer import NeMoPPOTrainer
from trlx.trainer.nemo_sft_trainer import NeMoSFTTrainer
except ImportError:
def _trainers_unavailble(names: List[str]):
def log_error(*args, **kwargs):
raise ImportError("NeMo is not installed.")
for name in names:
register_trainer(name)(log_error)
NeMo model imports from `trlx/models/modeling_nemo_ppo.py:13-30`:
from apex.transformer import parallel_state, tensor_parallel
from apex.transformer.pipeline_parallel.utils import _reconfigure_microbatch_calculator
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
from nemo.collections.nlp.models.language_modeling.megatron_base_model import MegatronBaseModel
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: NeMo is not installed` | NeMo toolkit not present | Install NeMo r1.15.0 from source (see Quick Install) |
| `ModuleNotFoundError: apex` | NVIDIA Apex not built | Build Apex from source with CUDA extensions |
| `megatron_legacy` config warning | Older NeMo checkpoint format | Set `megatron_legacy: True` in model config |
Compatibility Notes
- Version pinned: Only NeMo `r1.15.0` is supported. Later versions may have breaking API changes.
- Separate environment: NeMo and Apex have strict version requirements that may conflict with the base Accelerate environment. A dedicated conda/virtual environment is recommended.
- Pretrained models: NeMo `.nemo` checkpoints must be un-tarred before use. Set `train.trainer_kwargs.pretrained_model` to the extracted directory path.
Related Pages
- Implementation:CarperAI_Trlx_NeMo_PPO_Model
- Implementation:CarperAI_Trlx_NeMo_ILQL_Model
- Implementation:CarperAI_Trlx_NeMo_SFT_Model
- Implementation:CarperAI_Trlx_NeMo_PPO_Trainer
- Implementation:CarperAI_Trlx_NeMo_ILQL_Trainer
- Implementation:CarperAI_Trlx_NeMo_SFT_Trainer
- Implementation:CarperAI_Trlx_Convert_LLaMA_To_NeMo
- Implementation:CarperAI_Trlx_NeMo_Scaling_Benchmark