Principle:Volcengine Verl Split Resource Placement
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Training, Resource_Management, Reinforcement_Learning |
| Last Updated | 2026-02-07 18:00 GMT |
Overview
A resource allocation strategy that assigns actor and critic models to separate GPU pools to enable parallel updates and reduce memory contention.
Description
Split Resource Placement is a distributed training strategy where the actor (policy model + rollout engine) and critic (value function) are placed on physically separate GPU resource pools. In the default verl setup, actor and critic share the same GPU pool and execute sequentially. Split placement provides two benefits:
- Memory isolation — Each model has dedicated GPU memory, avoiding OOM from combined model footprint
- Parallel execution — Actor and critic update steps can overlap since they run on different hardware
The split is configured by creating two ResourcePool entries in the ResourcePoolManager, dividing available nodes or GPUs in half between actor_rollout_ref_pool and critic_pool.
Usage
Use this principle when training with PPO (which requires both actor and critic) and you have enough GPU resources to dedicate separate pools to each. It is most beneficial when the combined memory footprint of actor and critic exceeds single-pool capacity, or when you want to overlap actor and critic updates for faster training.
Theoretical Basis
Pseudo-code Logic:
# Abstract resource allocation (NOT real implementation)
total_gpus = nnodes * gpus_per_node
# Split placement divides resources:
actor_pool = total_gpus[:total_gpus // 2]
critic_pool = total_gpus[total_gpus // 2:]
# Each pool runs its model independently:
actor_worker_group = create_workers(actor_pool, ActorRolloutRefWorker)
critic_worker_group = create_workers(critic_pool, CriticWorker)
# Updates can now run in parallel:
actor_future = actor_worker_group.update_actor(batch) # non-blocking
critic_future = critic_worker_group.update_critic(batch) # non-blocking
actor_result = actor_future.get() # wait
critic_result = critic_future.get() # wait