Principle:Volcengine Verl Environment Setup
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Infrastructure, Training_Infrastructure |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
The process of initializing a distributed computing cluster with Ray, allocating GPU resource pools, and configuring the runtime environment for RL or SFT training.
Description
Environment Setup prepares the distributed execution context required by verl's training pipelines. verl uses Ray as its distributed computing framework, with a "single controller" pattern where a driver process orchestrates multiple GPU workers.
The setup involves:
- Installing verl and its dependencies (vLLM/SGLang for rollout, PyTorch for training)
- Initializing a Ray cluster with appropriate resource allocation
- Partitioning GPUs into resource pools for different roles (actor, critic, rollout, reward model)
- Configuring Hydra for experiment management
This is the first step in every training workflow and must complete before any training logic begins.
Usage
Environment setup is required before any verl training. The setup method varies by workflow:
- RL training (GRPO/PPO): Uses
ray.init()with GPU resource pools, launched viaverl.trainer.main_ppo - SFT training: Uses
torchrunwithtorch.distributed.init_process_group, launched viaverl.trainer.fsdp_sft_trainer
Theoretical Basis
Distributed training setup follows the single-program-multiple-data (SPMD) paradigm for data parallelism and the single-controller pattern for heterogeneous worker orchestration:
Pseudo-code:
# Abstract environment setup
# 1. Install dependencies
pip_install("verl", "vllm", "ray", "torch")
# 2. Initialize cluster
ray.init()
# 3. Allocate GPU pools
resource_pool_actor = GPUPool(gpus_per_node * nodes_for_actor)
resource_pool_critic = GPUPool(gpus_per_node * nodes_for_critic)
# 4. Launch workers
actor_worker = spawn(ActorRolloutRefWorker, resource_pool_actor)
critic_worker = spawn(CriticWorker, resource_pool_critic)