Implementation:Volcengine Verl Ray Init Cluster

Field	Value
Knowledge Sources	External Tool Doc (Ray, Hydra, verl)
Domains	Distributed Computing, Cluster Initialization, GPU Resource Management
Last Updated	2026-02-07

Overview

Description

This implementation initializes a Ray cluster and allocates GPU resource pools for distributed reinforcement learning training with the verl framework. It is the primary entry point for launching PPO-based training runs. The function run_ppo(config) checks whether Ray has already been initialized, and if not, calls ray.init() with a runtime environment that controls tokenizer parallelism, NCCL debug levels, VLLM logging, and optional transfer queue settings. After initialization, it creates a remote TaskRunner actor that orchestrates the entire training workflow, including worker registration, resource pool management, dataset creation, and the PPO training loop.

The Hydra decorator @hydra.main(config_path="config", config_name="ppo_trainer", version_base=None) on the main() function provides configuration management, enabling hierarchical config composition and command-line overrides.

Usage

To launch a PPO training run with verl:

Install the verl package:

pip install verl

Run the training entry point with Hydra config overrides:

python -m verl.trainer.main_ppo \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
    data.train_files=~/data/gsm8k/train.parquet \
    data.val_files=~/data/gsm8k/test.parquet

Code Reference

Attribute	Detail
Source Location	`verl/trainer/main_ppo.py`, Lines 49-106
Signature	`def run_ppo(config, task_runner_class=None) -> None`
Import	`from verl.trainer.main_ppo import run_ppo`

Additional entry point:

@hydra.main(config_path="config", config_name="ppo_trainer", version_base=None)
def main(config):
    auto_set_device(config)
    run_ppo(config)

I/O Contract

Inputs

Parameter	Type	Description
`config`	`OmegaConf DictConfig`	Hydra configuration object containing all training parameters
`config.trainer.n_gpus_per_node`	`int`	Number of GPUs to allocate per node in the resource pool
`config.trainer.nnodes`	`int`	Number of nodes in the distributed cluster
`config.ray_kwargs.ray_init`	`dict`	Keyword arguments passed directly to `ray.init()`
`config.transfer_queue.enable`	`bool`	Whether to enable transfer queue runtime env vars
`task_runner_class`	`Optional[ray.remote class]`	Custom Ray remote class for task execution (defaults to built-in `TaskRunner`)

Outputs

Output	Type	Description
Return value	`None`	The function runs the training process to completion via Ray remote execution
Side effect	Ray cluster	A Ray cluster is initialized (if not already running) with the configured runtime environment
Side effect	Resource pools	GPU resource pools are allocated according to `n_gpus_per_node * nnodes`
Side effect	Timeline file	Optional Ray timeline JSON file if `config.ray_kwargs.timeline_json_file` is set

Usage Examples

Example 1: Basic PPO training launch

from verl.trainer.main_ppo import run_ppo
from omegaconf import OmegaConf

config = OmegaConf.load("config/ppo_trainer.yaml")
config.trainer.n_gpus_per_node = 8
config.trainer.nnodes = 1
run_ppo(config)

Example 2: Resource pool allocation within TaskRunner

# Inside TaskRunner.init_resource_pool_mgr():
resource_pool_spec = {
    "global_pool": [config.trainer.n_gpus_per_node] * config.trainer.nnodes,
}
# e.g., for 8 GPUs on 2 nodes: {"global_pool": [8, 8]}

Example 3: Ray initialization with custom runtime env

import ray

ray.init(
    runtime_env={
        "env_vars": {
            "TOKENIZERS_PARALLELISM": "true",
            "NCCL_DEBUG": "WARN",
            "VLLM_LOGGING_LEVEL": "WARN",
        }
    }
)

Related Pages

Principle:Volcengine_Verl_Environment_Setup
Environment:Volcengine_Verl_CUDA_GPU_Environment
Environment:Volcengine_Verl_Ray_Distributed_Environment
Environment:Volcengine_Verl_Python_Core_Dependencies
verl/trainer/main_ppo.py -- Main entry point for PPO training
verl/trainer/ppo/ray_trainer.py -- RayPPOTrainer and ResourcePoolManager
verl/trainer/constants_ppo.py -- Default Ray runtime environment configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment