Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:NVIDIA NeMo Aligner PyTriton Serving Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Serving, Distributed_Training
Last Updated 2026-02-07 22:00 GMT

Overview

NVIDIA PyTriton 0.5.10 serving environment for deploying Reward Model and Critic servers in PPO and REINFORCE RLHF training pipelines.

Description

NeMo-Aligner uses NVIDIA PyTriton to serve the Reward Model and PPO Critic as inference servers during RLHF training. The actor training process communicates with these servers via HTTP using `FuturesModelClient` for asynchronous inference requests. PyTriton wraps the model inference functions with Triton Inference Server's batching and scheduling capabilities, enabling efficient multi-client serving with dynamic batching.

Usage

Use this environment when deploying Reward Model servers (for RM training serving and REINFORCE) or Critic servers (for PPO training). The servers are launched as separate processes that the actor training script connects to via HTTP. Required for any workflow involving `RewardModelServer.run_server()` or `CriticServerTrainer.run_server()`.

System Requirements

Category Requirement Notes
OS Linux PyTriton requires Linux for Triton Inference Server
Hardware NVIDIA GPU Server runs model inference on GPU
Network HTTP connectivity between actor and server processes Default ports configured in training scripts

Dependencies

Python Packages

Credentials

No additional credentials required beyond the base environment. Network ports must be accessible between server and client processes.

Quick Install

# PyTriton is installed in the Docker container
pip install --upgrade-strategy only-if-needed nvidia-pytriton==0.5.10
pip install -U --no-deps protobuf==4.24.4

Code Evidence

PyTriton imports for Reward Model server from `nemo_aligner/algorithms/reward_server.py:21-24`:

from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.model_config.common import DynamicBatcher
from pytriton.triton import Triton, TritonConfig

HTTP client for actor-server communication from `nemo_aligner/servers/http_communicator.py:15`:

from pytriton.client import FuturesModelClient

Server signal constants for coordination from `nemo_aligner/servers/constants.py:15-27`:

COMMUNICATE_NUM = 0
SERVER_SIGNAL_TRAIN = 1
SERVER_SIGNAL_VALIDATE = 2
SERVER_SIGNAL_SAVE = 3
SERVER_SIGNAL_EXIT = 4
SERVER_SIGNAL_OFFLOAD = 5
SERVER_SIGNAL_ONLOAD = 6

Common Errors

Error Message Cause Solution
`ImportError: pytriton` PyTriton not installed `pip install nvidia-pytriton==0.5.10`
`Connection refused` to reward/critic server Server not yet started or wrong port Ensure server process is launched before actor training begins
`protobuf` version conflicts Incompatible protobuf version Pin `protobuf==4.24.4`

Compatibility Notes

  • Inference Micro Batch Size: Can be specified as a list to provide PyTriton with preferred batch sizes for dynamic batching.
  • Multi-Process: The reward/critic server and actor training run as separate processes, potentially on different nodes.
  • Port Configuration: Server ports are configured in the YAML config files (e.g., `inference_rm.yaml`).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment