Environment:OpenRLHF OpenRLHF Ray Distributed Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Distributed_Training |
| Last Updated | 2026-02-07 10:00 GMT |
Overview
Ray == 2.48.0 cluster with placement groups for multi-model PPO training across distributed GPU resources.
Description
This environment provides the Ray distributed computing framework required for OpenRLHF's PPO and online RL training workflows. Ray manages actor placement, GPU resource allocation, and inter-process communication for the multi-model training setup (actor, critic, reference model, reward model, vLLM engines). It uses placement groups with PACK strategy to colocate related actors, and supports both colocated (hybrid) and separate model deployment patterns.
Usage
Use this environment for PPO Training, Math-GRPO Training, Rejection Sampling, and Iterative DPO workflows. These workflows distribute multiple models across a Ray cluster with placement groups for GPU scheduling. Non-Ray workflows (SFT, DPO, RM, KD) use DeepSpeed directly without Ray.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | Multiple NVIDIA GPUs | Typically 4-16+ GPUs across nodes |
| Network | High-bandwidth, low-latency | For Ray object store and NCCL communication |
| Head Node | Accessible from all workers | For Ray GCS and placement group coordination |
Dependencies
Python Packages
- `ray[default]` == 2.48.0 (pinned in requirements.txt)
- `grpcio` >= 1.74.0 (for Ray communication)
Credentials
The following environment variables are used:
- `RAY_ADDRESS`: Address of the Ray cluster GCS (auto-detected if not set)
- `RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES`: Prevents Ray from managing CUDA device visibility
- `RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES`: Same for AMD ROCm
- `RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES`: Same for HIP
- `RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES`: Same for Ascend NPU
- `MASTER_ADDR`: Set by Ray launcher for distributed training within actor groups
- `MASTER_PORT`: Set by Ray launcher for distributed training within actor groups
- `WORLD_SIZE`: Set by Ray launcher for distributed training within actor groups
- `RANK`: Set by Ray launcher for distributed training within actor groups
Quick Install
pip install "ray[default]==2.48.0" "grpcio>=1.74.0"
Code Evidence
Ray initialization from `openrlhf/cli/train_ppo_ray.py:21-22`:
if not ray.is_initialized():
ray.init(runtime_env={"env_vars": {"TOKENIZERS_PARALLELISM": "true", "NCCL_DEBUG": "WARN"}})
Placement group creation from `openrlhf/trainer/ray/launcher.py:233-240`:
pg = placement_group(bundles, strategy="PACK")
scheduling_strategy = PlacementGroupSchedulingStrategy(
placement_group=pg,
placement_group_bundle_index=...
)
Distributed env setup within Ray actors from `openrlhf/trainer/ray/launcher.py:28-36`:
os.environ["MASTER_ADDR"] = self._master_addr
os.environ["MASTER_PORT"] = str(self._master_port)
os.environ["WORLD_SIZE"] = str(self._world_size)
os.environ["RANK"] = str(self._rank)
os.environ["LOCAL_RANK"] = str(ray.get_gpu_ids()[0]) if ray_noset_visible_devices() else "0"
NOSET visible devices detection from `openrlhf/trainer/ray/utils.py:31-37`:
NOSET_VISIBLE_DEVICES_ENV_VARS_LIST = [
"RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES",
"RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES",
"RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES",
"RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES",
]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| Ray placement group timeout | Insufficient GPU resources in cluster | Add more GPU nodes or reduce resource requirements |
| `ray.init()` connection refused | Ray head node not running | Start Ray cluster first: `ray start --head` |
| NCCL timeout in actor group | Network issues between Ray workers | Check inter-node networking; increase NCCL timeout |
Compatibility Notes
- Ascend NPU: Code includes `RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES` support, indicating Ascend NPU compatibility is in progress.
- AMD ROCm: Code handles `ROCR_VISIBLE_DEVICES` and `HIP_VISIBLE_DEVICES`, suggesting partial AMD support.
- Slurm: Example scripts include Slurm integration for multi-node Ray cluster setup.
- PACK Strategy: All placement groups use PACK strategy to minimize inter-node communication.