Principle:Volcengine Verl Split Resource Placement

Knowledge Sources	Volcengine_Verl
Domains	Distributed_Training, Resource_Management, Reinforcement_Learning
Last Updated	2026-02-07 18:00 GMT

Overview

A resource allocation strategy that assigns actor and critic models to separate GPU pools to enable parallel updates and reduce memory contention.

Description

Split Resource Placement is a distributed training strategy where the actor (policy model + rollout engine) and critic (value function) are placed on physically separate GPU resource pools. In the default verl setup, actor and critic share the same GPU pool and execute sequentially. Split placement provides two benefits:

Memory isolation — Each model has dedicated GPU memory, avoiding OOM from combined model footprint
Parallel execution — Actor and critic update steps can overlap since they run on different hardware

The split is configured by creating two ResourcePool entries in the ResourcePoolManager, dividing available nodes or GPUs in half between actor_rollout_ref_pool and critic_pool.

Usage

Use this principle when training with PPO (which requires both actor and critic) and you have enough GPU resources to dedicate separate pools to each. It is most beneficial when the combined memory footprint of actor and critic exceeds single-pool capacity, or when you want to overlap actor and critic updates for faster training.

Theoretical Basis

Pseudo-code Logic:

# Abstract resource allocation (NOT real implementation)
total_gpus = nnodes * gpus_per_node

# Split placement divides resources:
actor_pool = total_gpus[:total_gpus // 2]
critic_pool = total_gpus[total_gpus // 2:]

# Each pool runs its model independently:
actor_worker_group = create_workers(actor_pool, ActorRolloutRefWorker)
critic_worker_group = create_workers(critic_pool, CriticWorker)

# Updates can now run in parallel:
actor_future = actor_worker_group.update_actor(batch)  # non-blocking
critic_future = critic_worker_group.update_critic(batch)  # non-blocking
actor_result = actor_future.get()  # wait
critic_result = critic_future.get()  # wait

Related Pages

Implementation:Volcengine_Verl_Split_Placement_PPO_Entry

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment