Heuristic:NVIDIA NeMo Aligner PPO NCCL Algorithm Setting

Knowledge Sources	NeMo-Aligner NCCL Algorithm Requirement
Domains	Distributed_Training, Debugging, PPO
Last Updated	2026-02-07 22:00 GMT

Overview

Critical stability fix requiring `export NCCL_ALGO=Tree` when running PPO training to prevent NCCL communication hangs and deadlocks.

Description

PPO training in NeMo-Aligner involves complex multi-process communication between actor, critic, and reward model servers. The default NCCL algorithm selection can cause deadlocks or hangs during collective operations (all-reduce, broadcast) in this multi-server setup. Setting `NCCL_ALGO=Tree` forces NCCL to use the tree-based algorithm for all collective operations, which is more stable in the presence of the mixed communication patterns that PPO requires.

Usage

Use this heuristic always when running PPO or REINFORCE training. This is a required setting documented in the official RLHF user guide and CHANGELOG as a stability fix. Without this setting, training may hang indefinitely during distributed communication.

The Insight (Rule of Thumb)

Action: Add `export NCCL_ALGO=Tree` to your training launch script before running PPO or REINFORCE training.
Value: `Tree` (other options include `Ring`, but `Tree` is required for stability).
Trade-off: Minimal performance impact. The tree algorithm may be slightly slower than ring for certain communication patterns, but it prevents hangs.

Reasoning

PPO training involves multiple independent process groups (actor DP group, critic server, reward server) that perform interleaved collective operations. The default NCCL algorithm auto-selection can choose different algorithms for different operations, creating deadlock conditions when process groups overlap. The tree algorithm provides a consistent communication pattern that avoids these deadlocks. This was identified as a bug fix in the NeMo-Aligner CHANGELOG and is now a documented requirement.

Code Evidence

CHANGELOG bug fix entry from `CHANGELOG.md:52`:

- It is now required, for stability, to add `export NCCL_ALGO=...` to scripts launching PPO training loop.
  Please see the [RLHF docs](./docs/user-guide/rlhf.rst) for information.

RLHF documentation requirement from `docs/user-guide/rlhf.rst:260-262`:

# recommended to set NCCL_ALGO. See https://github.com/NVIDIA/Megatron-LM/blob/
# b3375a0e38c10e2300ef4be031f7dcabab52b448/megatron/training/arguments.py#L593-L595
export NCCL_ALGO=Tree

Functional test scripts setting NCCL_ALGO from `tests/functional/ppo.sh:21`:

export NCCL_ALGO=Tree

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment