Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Volcengine Verl Split Resource Placement

From Leeroopedia


Knowledge Sources
Domains Distributed_Training, Resource_Management, Reinforcement_Learning
Last Updated 2026-02-07 18:00 GMT

Overview

A resource allocation strategy that assigns actor and critic models to separate GPU pools to enable parallel updates and reduce memory contention.

Description

Split Resource Placement is a distributed training strategy where the actor (policy model + rollout engine) and critic (value function) are placed on physically separate GPU resource pools. In the default verl setup, actor and critic share the same GPU pool and execute sequentially. Split placement provides two benefits:

  • Memory isolation — Each model has dedicated GPU memory, avoiding OOM from combined model footprint
  • Parallel execution — Actor and critic update steps can overlap since they run on different hardware

The split is configured by creating two ResourcePool entries in the ResourcePoolManager, dividing available nodes or GPUs in half between actor_rollout_ref_pool and critic_pool.

Usage

Use this principle when training with PPO (which requires both actor and critic) and you have enough GPU resources to dedicate separate pools to each. It is most beneficial when the combined memory footprint of actor and critic exceeds single-pool capacity, or when you want to overlap actor and critic updates for faster training.

Theoretical Basis

Pseudo-code Logic:

# Abstract resource allocation (NOT real implementation)
total_gpus = nnodes * gpus_per_node

# Split placement divides resources:
actor_pool = total_gpus[:total_gpus // 2]
critic_pool = total_gpus[total_gpus // 2:]

# Each pool runs its model independently:
actor_worker_group = create_workers(actor_pool, ActorRolloutRefWorker)
critic_worker_group = create_workers(critic_pool, CriticWorker)

# Updates can now run in parallel:
actor_future = actor_worker_group.update_actor(batch)  # non-blocking
critic_future = critic_worker_group.update_critic(batch)  # non-blocking
actor_result = actor_future.get()  # wait
critic_result = critic_future.get()  # wait

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment