Implementation:Volcengine Verl Split Placement Fit

Knowledge Sources	Volcengine_Verl
Domains	Reinforcement_Learning, Distributed_Training, Optimization
Last Updated	2026-02-07 18:00 GMT

Overview

Concrete tool for running a modified PPO training loop with parallel actor and critic updates on separate resource pools, provided by the verl framework.

Description

The split_monkey_patch.py module provides a replacement fit() function for RayPPOTrainer that enables parallel execution of actor and critic update steps. In the standard verl training loop, actor and critic updates run sequentially. This implementation exploits the split resource placement (actor and critic on separate GPU pools) to overlap these two heavy compute phases.

Key differences from the standard fit() loop:

Actor update (update_actor) and critic update (update_critic) are issued concurrently using non-blocking RPC calls
The function then waits for both to complete using .get() calls
This parallelization can significantly reduce wall-clock training time when actor and critic reside on different hardware

The function is designed to be monkey-patched onto RayPPOTrainer via RayPPOTrainer.fit = fit.

Usage

Use this modified fit function when running PPO with split placement (actor and critic on separate GPU resource pools). It is automatically applied by main_ppo_split.py and should not be imported independently.

Code Reference

Source Location

Repository: Volcengine_Verl
File: examples/split_placement/split_monkey_patch.py
Lines: 1-237

Signature

def fit(self):
    """
    The training loop of PPO with parallel actor/critic updates.

    The driver process calls compute functions of the worker group through RPC
    to construct the PPO dataflow. Light-weight advantage computation is done
    on the driver process.

    Key difference from standard fit(): actor and critic updates run in parallel
    using non-blocking RPC calls when split placement is active.
    """

Import

from split_monkey_patch import fit

# Applied via monkey-patching:
RayPPOTrainer.fit = fit

I/O Contract

Inputs

Name	Type	Required	Description
self	RayPPOTrainer	Yes	The trainer instance (monkey-patched method)
self.config	OmegaConf	Yes	Full training configuration
self.actor_rollout_wg	WorkerGroup	Yes	Actor rollout worker group
self.critic_wg	WorkerGroup	Yes (if use_critic)	Critic worker group on separate pool
self.train_dataloader	DataLoader	Yes	Training data iterator

Outputs

Name	Type	Description
metrics	dict	Training metrics logged each step (actor loss, critic loss, rewards, timing)
checkpoints	files	Saved at configured save_freq intervals
val_metrics	dict	Validation metrics at configured test_freq intervals

Usage Examples

Applying the Monkey Patch

from split_monkey_patch import fit
from verl.trainer.ppo.ray_trainer import RayPPOTrainer

# Monkey-patch the fit method to enable parallel updates
RayPPOTrainer.fit = fit

# Create trainer with split resource pools
trainer = RayPPOTrainer(
    config=config,
    tokenizer=tokenizer,
    role_worker_mapping=role_worker_mapping,
    resource_pool_manager=resource_pool_manager,
    ray_worker_group_cls=ray_worker_group_cls,
    reward_fn=reward_fn,
    val_reward_fn=val_reward_fn,
)
trainer.init_workers()
trainer.fit()  # Uses the parallel actor/critic update loop

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment