Implementation:Volcengine Verl Ray Init Cluster
| Field | Value |
|---|---|
| Knowledge Sources | External Tool Doc (Ray, Hydra, verl) |
| Domains | Distributed Computing, Cluster Initialization, GPU Resource Management |
| Last Updated | 2026-02-07 |
Overview
Description
This implementation initializes a Ray cluster and allocates GPU resource pools for distributed reinforcement learning training with the verl framework. It is the primary entry point for launching PPO-based training runs. The function run_ppo(config) checks whether Ray has already been initialized, and if not, calls ray.init() with a runtime environment that controls tokenizer parallelism, NCCL debug levels, VLLM logging, and optional transfer queue settings. After initialization, it creates a remote TaskRunner actor that orchestrates the entire training workflow, including worker registration, resource pool management, dataset creation, and the PPO training loop.
The Hydra decorator @hydra.main(config_path="config", config_name="ppo_trainer", version_base=None) on the main() function provides configuration management, enabling hierarchical config composition and command-line overrides.
Usage
To launch a PPO training run with verl:
- Install the verl package:
pip install verl
- Run the training entry point with Hydra config overrides:
python -m verl.trainer.main_ppo \
trainer.n_gpus_per_node=8 \
trainer.nnodes=1 \
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
data.train_files=~/data/gsm8k/train.parquet \
data.val_files=~/data/gsm8k/test.parquet
Code Reference
| Attribute | Detail |
|---|---|
| Source Location | verl/trainer/main_ppo.py, Lines 49-106
|
| Signature | def run_ppo(config, task_runner_class=None) -> None
|
| Import | from verl.trainer.main_ppo import run_ppo
|
Additional entry point:
@hydra.main(config_path="config", config_name="ppo_trainer", version_base=None)
def main(config):
auto_set_device(config)
run_ppo(config)
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
config |
OmegaConf DictConfig |
Hydra configuration object containing all training parameters |
config.trainer.n_gpus_per_node |
int |
Number of GPUs to allocate per node in the resource pool |
config.trainer.nnodes |
int |
Number of nodes in the distributed cluster |
config.ray_kwargs.ray_init |
dict |
Keyword arguments passed directly to ray.init()
|
config.transfer_queue.enable |
bool |
Whether to enable transfer queue runtime env vars |
task_runner_class |
Optional[ray.remote class] |
Custom Ray remote class for task execution (defaults to built-in TaskRunner)
|
Outputs
| Output | Type | Description |
|---|---|---|
| Return value | None |
The function runs the training process to completion via Ray remote execution |
| Side effect | Ray cluster | A Ray cluster is initialized (if not already running) with the configured runtime environment |
| Side effect | Resource pools | GPU resource pools are allocated according to n_gpus_per_node * nnodes
|
| Side effect | Timeline file | Optional Ray timeline JSON file if config.ray_kwargs.timeline_json_file is set
|
Usage Examples
Example 1: Basic PPO training launch
from verl.trainer.main_ppo import run_ppo
from omegaconf import OmegaConf
config = OmegaConf.load("config/ppo_trainer.yaml")
config.trainer.n_gpus_per_node = 8
config.trainer.nnodes = 1
run_ppo(config)
Example 2: Resource pool allocation within TaskRunner
# Inside TaskRunner.init_resource_pool_mgr():
resource_pool_spec = {
"global_pool": [config.trainer.n_gpus_per_node] * config.trainer.nnodes,
}
# e.g., for 8 GPUs on 2 nodes: {"global_pool": [8, 8]}
Example 3: Ray initialization with custom runtime env
import ray
ray.init(
runtime_env={
"env_vars": {
"TOKENIZERS_PARALLELISM": "true",
"NCCL_DEBUG": "WARN",
"VLLM_LOGGING_LEVEL": "WARN",
}
}
)
Related Pages
- Principle:Volcengine_Verl_Environment_Setup
- Environment:Volcengine_Verl_CUDA_GPU_Environment
- Environment:Volcengine_Verl_Ray_Distributed_Environment
- Environment:Volcengine_Verl_Python_Core_Dependencies
- verl/trainer/main_ppo.py -- Main entry point for PPO training
- verl/trainer/ppo/ray_trainer.py -- RayPPOTrainer and ResourcePoolManager
- verl/trainer/constants_ppo.py -- Default Ray runtime environment configuration