Implementation:Volcengine Verl Split Placement Fit
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Distributed_Training, Optimization |
| Last Updated | 2026-02-07 18:00 GMT |
Overview
Concrete tool for running a modified PPO training loop with parallel actor and critic updates on separate resource pools, provided by the verl framework.
Description
The split_monkey_patch.py module provides a replacement fit() function for RayPPOTrainer that enables parallel execution of actor and critic update steps. In the standard verl training loop, actor and critic updates run sequentially. This implementation exploits the split resource placement (actor and critic on separate GPU pools) to overlap these two heavy compute phases.
Key differences from the standard fit() loop:
- Actor update (update_actor) and critic update (update_critic) are issued concurrently using non-blocking RPC calls
- The function then waits for both to complete using .get() calls
- This parallelization can significantly reduce wall-clock training time when actor and critic reside on different hardware
The function is designed to be monkey-patched onto RayPPOTrainer via RayPPOTrainer.fit = fit.
Usage
Use this modified fit function when running PPO with split placement (actor and critic on separate GPU resource pools). It is automatically applied by main_ppo_split.py and should not be imported independently.
Code Reference
Source Location
- Repository: Volcengine_Verl
- File: examples/split_placement/split_monkey_patch.py
- Lines: 1-237
Signature
def fit(self):
"""
The training loop of PPO with parallel actor/critic updates.
The driver process calls compute functions of the worker group through RPC
to construct the PPO dataflow. Light-weight advantage computation is done
on the driver process.
Key difference from standard fit(): actor and critic updates run in parallel
using non-blocking RPC calls when split placement is active.
"""
Import
from split_monkey_patch import fit
# Applied via monkey-patching:
RayPPOTrainer.fit = fit
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| self | RayPPOTrainer | Yes | The trainer instance (monkey-patched method) |
| self.config | OmegaConf | Yes | Full training configuration |
| self.actor_rollout_wg | WorkerGroup | Yes | Actor rollout worker group |
| self.critic_wg | WorkerGroup | Yes (if use_critic) | Critic worker group on separate pool |
| self.train_dataloader | DataLoader | Yes | Training data iterator |
Outputs
| Name | Type | Description |
|---|---|---|
| metrics | dict | Training metrics logged each step (actor loss, critic loss, rewards, timing) |
| checkpoints | files | Saved at configured save_freq intervals |
| val_metrics | dict | Validation metrics at configured test_freq intervals |
Usage Examples
Applying the Monkey Patch
from split_monkey_patch import fit
from verl.trainer.ppo.ray_trainer import RayPPOTrainer
# Monkey-patch the fit method to enable parallel updates
RayPPOTrainer.fit = fit
# Create trainer with split resource pools
trainer = RayPPOTrainer(
config=config,
tokenizer=tokenizer,
role_worker_mapping=role_worker_mapping,
resource_pool_manager=resource_pool_manager,
ray_worker_group_cls=ray_worker_group_cls,
reward_fn=reward_fn,
val_reward_fn=val_reward_fn,
)
trainer.init_workers()
trainer.fit() # Uses the parallel actor/critic update loop