Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Volcengine Verl Ray Init Cluster

From Leeroopedia


Field Value
Knowledge Sources External Tool Doc (Ray, Hydra, verl)
Domains Distributed Computing, Cluster Initialization, GPU Resource Management
Last Updated 2026-02-07

Overview

Description

This implementation initializes a Ray cluster and allocates GPU resource pools for distributed reinforcement learning training with the verl framework. It is the primary entry point for launching PPO-based training runs. The function run_ppo(config) checks whether Ray has already been initialized, and if not, calls ray.init() with a runtime environment that controls tokenizer parallelism, NCCL debug levels, VLLM logging, and optional transfer queue settings. After initialization, it creates a remote TaskRunner actor that orchestrates the entire training workflow, including worker registration, resource pool management, dataset creation, and the PPO training loop.

The Hydra decorator @hydra.main(config_path="config", config_name="ppo_trainer", version_base=None) on the main() function provides configuration management, enabling hierarchical config composition and command-line overrides.

Usage

To launch a PPO training run with verl:

  1. Install the verl package:
pip install verl
  1. Run the training entry point with Hydra config overrides:
python -m verl.trainer.main_ppo \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
    data.train_files=~/data/gsm8k/train.parquet \
    data.val_files=~/data/gsm8k/test.parquet

Code Reference

Attribute Detail
Source Location verl/trainer/main_ppo.py, Lines 49-106
Signature def run_ppo(config, task_runner_class=None) -> None
Import from verl.trainer.main_ppo import run_ppo

Additional entry point:

@hydra.main(config_path="config", config_name="ppo_trainer", version_base=None)
def main(config):
    auto_set_device(config)
    run_ppo(config)

I/O Contract

Inputs

Parameter Type Description
config OmegaConf DictConfig Hydra configuration object containing all training parameters
config.trainer.n_gpus_per_node int Number of GPUs to allocate per node in the resource pool
config.trainer.nnodes int Number of nodes in the distributed cluster
config.ray_kwargs.ray_init dict Keyword arguments passed directly to ray.init()
config.transfer_queue.enable bool Whether to enable transfer queue runtime env vars
task_runner_class Optional[ray.remote class] Custom Ray remote class for task execution (defaults to built-in TaskRunner)

Outputs

Output Type Description
Return value None The function runs the training process to completion via Ray remote execution
Side effect Ray cluster A Ray cluster is initialized (if not already running) with the configured runtime environment
Side effect Resource pools GPU resource pools are allocated according to n_gpus_per_node * nnodes
Side effect Timeline file Optional Ray timeline JSON file if config.ray_kwargs.timeline_json_file is set

Usage Examples

Example 1: Basic PPO training launch

from verl.trainer.main_ppo import run_ppo
from omegaconf import OmegaConf

config = OmegaConf.load("config/ppo_trainer.yaml")
config.trainer.n_gpus_per_node = 8
config.trainer.nnodes = 1
run_ppo(config)

Example 2: Resource pool allocation within TaskRunner

# Inside TaskRunner.init_resource_pool_mgr():
resource_pool_spec = {
    "global_pool": [config.trainer.n_gpus_per_node] * config.trainer.nnodes,
}
# e.g., for 8 GPUs on 2 nodes: {"global_pool": [8, 8]}

Example 3: Ray initialization with custom runtime env

import ray

ray.init(
    runtime_env={
        "env_vars": {
            "TOKENIZERS_PARALLELISM": "true",
            "NCCL_DEBUG": "WARN",
            "VLLM_LOGGING_LEVEL": "WARN",
        }
    }
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment