Environment:OpenRLHF OpenRLHF Ray Distributed Environment

Knowledge Sources	OpenRLHF Ray
Domains	Infrastructure, Distributed_Training
Last Updated	2026-02-07 10:00 GMT

Overview

Ray == 2.48.0 cluster with placement groups for multi-model PPO training across distributed GPU resources.

Description

This environment provides the Ray distributed computing framework required for OpenRLHF's PPO and online RL training workflows. Ray manages actor placement, GPU resource allocation, and inter-process communication for the multi-model training setup (actor, critic, reference model, reward model, vLLM engines). It uses placement groups with PACK strategy to colocate related actors, and supports both colocated (hybrid) and separate model deployment patterns.

Usage

Use this environment for PPO Training, Math-GRPO Training, Rejection Sampling, and Iterative DPO workflows. These workflows distribute multiple models across a Ray cluster with placement groups for GPU scheduling. Non-Ray workflows (SFT, DPO, RM, KD) use DeepSpeed directly without Ray.

System Requirements

Category	Requirement	Notes
Hardware	Multiple NVIDIA GPUs	Typically 4-16+ GPUs across nodes
Network	High-bandwidth, low-latency	For Ray object store and NCCL communication
Head Node	Accessible from all workers	For Ray GCS and placement group coordination

Dependencies

Python Packages

`ray[default]` == 2.48.0 (pinned in requirements.txt)
`grpcio` >= 1.74.0 (for Ray communication)

Credentials

The following environment variables are used:

`RAY_ADDRESS`: Address of the Ray cluster GCS (auto-detected if not set)
`RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES`: Prevents Ray from managing CUDA device visibility
`RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES`: Same for AMD ROCm
`RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES`: Same for HIP
`RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES`: Same for Ascend NPU
`MASTER_ADDR`: Set by Ray launcher for distributed training within actor groups
`MASTER_PORT`: Set by Ray launcher for distributed training within actor groups
`WORLD_SIZE`: Set by Ray launcher for distributed training within actor groups
`RANK`: Set by Ray launcher for distributed training within actor groups

Quick Install

pip install "ray[default]==2.48.0" "grpcio>=1.74.0"

Code Evidence

Ray initialization from `openrlhf/cli/train_ppo_ray.py:21-22`:

if not ray.is_initialized():
    ray.init(runtime_env={"env_vars": {"TOKENIZERS_PARALLELISM": "true", "NCCL_DEBUG": "WARN"}})

Placement group creation from `openrlhf/trainer/ray/launcher.py:233-240`:

pg = placement_group(bundles, strategy="PACK")
scheduling_strategy = PlacementGroupSchedulingStrategy(
    placement_group=pg,
    placement_group_bundle_index=...
)

Distributed env setup within Ray actors from `openrlhf/trainer/ray/launcher.py:28-36`:

os.environ["MASTER_ADDR"] = self._master_addr
os.environ["MASTER_PORT"] = str(self._master_port)
os.environ["WORLD_SIZE"] = str(self._world_size)
os.environ["RANK"] = str(self._rank)
os.environ["LOCAL_RANK"] = str(ray.get_gpu_ids()[0]) if ray_noset_visible_devices() else "0"

NOSET visible devices detection from `openrlhf/trainer/ray/utils.py:31-37`:

NOSET_VISIBLE_DEVICES_ENV_VARS_LIST = [
    "RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES",
    "RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES",
    "RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES",
    "RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES",
]

Common Errors

Error Message	Cause	Solution
Ray placement group timeout	Insufficient GPU resources in cluster	Add more GPU nodes or reduce resource requirements
`ray.init()` connection refused	Ray head node not running	Start Ray cluster first: `ray start --head`
NCCL timeout in actor group	Network issues between Ray workers	Check inter-node networking; increase NCCL timeout

Compatibility Notes

Ascend NPU: Code includes `RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES` support, indicating Ascend NPU compatibility is in progress.
AMD ROCm: Code handles `ROCR_VISIBLE_DEVICES` and `HIP_VISIBLE_DEVICES`, suggesting partial AMD support.
Slurm: Example scripts include Slurm integration for multi-node Ray cluster setup.
PACK Strategy: All placement groups use PACK strategy to minimize inter-node communication.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment