Environment:NVIDIA NeMo Aligner PyTriton Serving Environment

Knowledge Sources	NeMo-Aligner PyTriton
Domains	Infrastructure, Serving, Distributed_Training
Last Updated	2026-02-07 22:00 GMT

Overview

NVIDIA PyTriton 0.5.10 serving environment for deploying Reward Model and Critic servers in PPO and REINFORCE RLHF training pipelines.

Description

NeMo-Aligner uses NVIDIA PyTriton to serve the Reward Model and PPO Critic as inference servers during RLHF training. The actor training process communicates with these servers via HTTP using `FuturesModelClient` for asynchronous inference requests. PyTriton wraps the model inference functions with Triton Inference Server's batching and scheduling capabilities, enabling efficient multi-client serving with dynamic batching.

Usage

Use this environment when deploying Reward Model servers (for RM training serving and REINFORCE) or Critic servers (for PPO training). The servers are launched as separate processes that the actor training script connects to via HTTP. Required for any workflow involving `RewardModelServer.run_server()` or `CriticServerTrainer.run_server()`.

System Requirements

Category	Requirement	Notes
OS	Linux	PyTriton requires Linux for Triton Inference Server
Hardware	NVIDIA GPU	Server runs model inference on GPU
Network	HTTP connectivity between actor and server processes	Default ports configured in training scripts

Dependencies

Python Packages

`nvidia-pytriton` == 0.5.10
`protobuf` == 4.24.4
All dependencies from Environment:NVIDIA_NeMo_Aligner_NeMo_Framework_GPU_Environment

Credentials

No additional credentials required beyond the base environment. Network ports must be accessible between server and client processes.

Quick Install

# PyTriton is installed in the Docker container
pip install --upgrade-strategy only-if-needed nvidia-pytriton==0.5.10
pip install -U --no-deps protobuf==4.24.4

Code Evidence

PyTriton imports for Reward Model server from `nemo_aligner/algorithms/reward_server.py:21-24`:

from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.model_config.common import DynamicBatcher
from pytriton.triton import Triton, TritonConfig

HTTP client for actor-server communication from `nemo_aligner/servers/http_communicator.py:15`:

from pytriton.client import FuturesModelClient

Server signal constants for coordination from `nemo_aligner/servers/constants.py:15-27`:

COMMUNICATE_NUM = 0
SERVER_SIGNAL_TRAIN = 1
SERVER_SIGNAL_VALIDATE = 2
SERVER_SIGNAL_SAVE = 3
SERVER_SIGNAL_EXIT = 4
SERVER_SIGNAL_OFFLOAD = 5
SERVER_SIGNAL_ONLOAD = 6

Common Errors

Error Message	Cause	Solution
`ImportError: pytriton`	PyTriton not installed	`pip install nvidia-pytriton==0.5.10`
`Connection refused` to reward/critic server	Server not yet started or wrong port	Ensure server process is launched before actor training begins
`protobuf` version conflicts	Incompatible protobuf version	Pin `protobuf==4.24.4`

Compatibility Notes

Inference Micro Batch Size: Can be specified as a list to provide PyTriton with preferred batch sizes for dynamic batching.
Multi-Process: The reward/critic server and actor training run as separate processes, potentially on different nodes.
Port Configuration: Server ports are configured in the YAML config files (e.g., `inference_rm.yaml`).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment