Environment:NVIDIA NeMo Aligner PyTriton Serving Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Serving, Distributed_Training |
| Last Updated | 2026-02-07 22:00 GMT |
Overview
NVIDIA PyTriton 0.5.10 serving environment for deploying Reward Model and Critic servers in PPO and REINFORCE RLHF training pipelines.
Description
NeMo-Aligner uses NVIDIA PyTriton to serve the Reward Model and PPO Critic as inference servers during RLHF training. The actor training process communicates with these servers via HTTP using `FuturesModelClient` for asynchronous inference requests. PyTriton wraps the model inference functions with Triton Inference Server's batching and scheduling capabilities, enabling efficient multi-client serving with dynamic batching.
Usage
Use this environment when deploying Reward Model servers (for RM training serving and REINFORCE) or Critic servers (for PPO training). The servers are launched as separate processes that the actor training script connects to via HTTP. Required for any workflow involving `RewardModelServer.run_server()` or `CriticServerTrainer.run_server()`.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | PyTriton requires Linux for Triton Inference Server |
| Hardware | NVIDIA GPU | Server runs model inference on GPU |
| Network | HTTP connectivity between actor and server processes | Default ports configured in training scripts |
Dependencies
Python Packages
- `nvidia-pytriton` == 0.5.10
- `protobuf` == 4.24.4
- All dependencies from Environment:NVIDIA_NeMo_Aligner_NeMo_Framework_GPU_Environment
Credentials
No additional credentials required beyond the base environment. Network ports must be accessible between server and client processes.
Quick Install
# PyTriton is installed in the Docker container
pip install --upgrade-strategy only-if-needed nvidia-pytriton==0.5.10
pip install -U --no-deps protobuf==4.24.4
Code Evidence
PyTriton imports for Reward Model server from `nemo_aligner/algorithms/reward_server.py:21-24`:
from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.model_config.common import DynamicBatcher
from pytriton.triton import Triton, TritonConfig
HTTP client for actor-server communication from `nemo_aligner/servers/http_communicator.py:15`:
from pytriton.client import FuturesModelClient
Server signal constants for coordination from `nemo_aligner/servers/constants.py:15-27`:
COMMUNICATE_NUM = 0
SERVER_SIGNAL_TRAIN = 1
SERVER_SIGNAL_VALIDATE = 2
SERVER_SIGNAL_SAVE = 3
SERVER_SIGNAL_EXIT = 4
SERVER_SIGNAL_OFFLOAD = 5
SERVER_SIGNAL_ONLOAD = 6
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: pytriton` | PyTriton not installed | `pip install nvidia-pytriton==0.5.10` |
| `Connection refused` to reward/critic server | Server not yet started or wrong port | Ensure server process is launched before actor training begins |
| `protobuf` version conflicts | Incompatible protobuf version | Pin `protobuf==4.24.4` |
Compatibility Notes
- Inference Micro Batch Size: Can be specified as a list to provide PyTriton with preferred batch sizes for dynamic batching.
- Multi-Process: The reward/critic server and actor training run as separate processes, potentially on different nodes.
- Port Configuration: Server ports are configured in the YAML config files (e.g., `inference_rm.yaml`).
Related Pages
- Implementation:NVIDIA_NeMo_Aligner_RewardModelServer_Run
- Implementation:NVIDIA_NeMo_Aligner_CriticServerTrainer_Run
- Implementation:NVIDIA_NeMo_Aligner_MegatronGPT_Actor_And_Critic_Client
- Implementation:NVIDIA_NeMo_Aligner_MegatronGPT_Reinforce_Actor_And_RM_Client
- Implementation:NVIDIA_NeMo_Aligner_PPOTrainer_Fit