Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:OpenRLHF OpenRLHF Ray Distributed Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Distributed_Training
Last Updated 2026-02-07 10:00 GMT

Overview

Ray == 2.48.0 cluster with placement groups for multi-model PPO training across distributed GPU resources.

Description

This environment provides the Ray distributed computing framework required for OpenRLHF's PPO and online RL training workflows. Ray manages actor placement, GPU resource allocation, and inter-process communication for the multi-model training setup (actor, critic, reference model, reward model, vLLM engines). It uses placement groups with PACK strategy to colocate related actors, and supports both colocated (hybrid) and separate model deployment patterns.

Usage

Use this environment for PPO Training, Math-GRPO Training, Rejection Sampling, and Iterative DPO workflows. These workflows distribute multiple models across a Ray cluster with placement groups for GPU scheduling. Non-Ray workflows (SFT, DPO, RM, KD) use DeepSpeed directly without Ray.

System Requirements

Category Requirement Notes
Hardware Multiple NVIDIA GPUs Typically 4-16+ GPUs across nodes
Network High-bandwidth, low-latency For Ray object store and NCCL communication
Head Node Accessible from all workers For Ray GCS and placement group coordination

Dependencies

Python Packages

  • `ray[default]` == 2.48.0 (pinned in requirements.txt)
  • `grpcio` >= 1.74.0 (for Ray communication)

Credentials

The following environment variables are used:

  • `RAY_ADDRESS`: Address of the Ray cluster GCS (auto-detected if not set)
  • `RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES`: Prevents Ray from managing CUDA device visibility
  • `RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES`: Same for AMD ROCm
  • `RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES`: Same for HIP
  • `RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES`: Same for Ascend NPU
  • `MASTER_ADDR`: Set by Ray launcher for distributed training within actor groups
  • `MASTER_PORT`: Set by Ray launcher for distributed training within actor groups
  • `WORLD_SIZE`: Set by Ray launcher for distributed training within actor groups
  • `RANK`: Set by Ray launcher for distributed training within actor groups

Quick Install

pip install "ray[default]==2.48.0" "grpcio>=1.74.0"

Code Evidence

Ray initialization from `openrlhf/cli/train_ppo_ray.py:21-22`:

if not ray.is_initialized():
    ray.init(runtime_env={"env_vars": {"TOKENIZERS_PARALLELISM": "true", "NCCL_DEBUG": "WARN"}})

Placement group creation from `openrlhf/trainer/ray/launcher.py:233-240`:

pg = placement_group(bundles, strategy="PACK")
scheduling_strategy = PlacementGroupSchedulingStrategy(
    placement_group=pg,
    placement_group_bundle_index=...
)

Distributed env setup within Ray actors from `openrlhf/trainer/ray/launcher.py:28-36`:

os.environ["MASTER_ADDR"] = self._master_addr
os.environ["MASTER_PORT"] = str(self._master_port)
os.environ["WORLD_SIZE"] = str(self._world_size)
os.environ["RANK"] = str(self._rank)
os.environ["LOCAL_RANK"] = str(ray.get_gpu_ids()[0]) if ray_noset_visible_devices() else "0"

NOSET visible devices detection from `openrlhf/trainer/ray/utils.py:31-37`:

NOSET_VISIBLE_DEVICES_ENV_VARS_LIST = [
    "RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES",
    "RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES",
    "RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES",
    "RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES",
]

Common Errors

Error Message Cause Solution
Ray placement group timeout Insufficient GPU resources in cluster Add more GPU nodes or reduce resource requirements
`ray.init()` connection refused Ray head node not running Start Ray cluster first: `ray start --head`
NCCL timeout in actor group Network issues between Ray workers Check inter-node networking; increase NCCL timeout

Compatibility Notes

  • Ascend NPU: Code includes `RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES` support, indicating Ascend NPU compatibility is in progress.
  • AMD ROCm: Code handles `ROCR_VISIBLE_DEVICES` and `HIP_VISIBLE_DEVICES`, suggesting partial AMD support.
  • Slurm: Example scripts include Slurm integration for multi-node Ray cluster setup.
  • PACK Strategy: All placement groups use PACK strategy to minimize inter-node communication.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment