Heuristic:Volcengine Verl Layered Summon Memory Tradeoff

Metadata:

Sources: Repo|verl|https://github.com/volcengine/verl
Domains: Optimization, Infrastructure
Last Updated: 2026-02-07 17:00 GMT

Overview

Use layered_summon for huge models to prevent OOM at the cost of increased synchronization latency.

Description

The layered_summon feature controls how model weights are synchronized between training and rollout engines. When disabled (default), all weights are synchronized at once, which is fast but requires enough memory to hold both copies. When enabled, weights are synchronized layer by layer, dramatically reducing peak memory but adding latency for the layer-by-layer transfer.

Usage

Enable when training huge models (70B+) that cause OOM during the weight synchronization phase between training and rollout.

The Insight

Action: Set trainer.layered_summon: True in config
Value: Default is False. Enable only for huge models where OOM occurs during weight sync.
Trade-off: Saves significant memory (prevents OOM) but makes the training-to-rollout transition slower.
Interaction: When layered_summon is enabled, vLLM sleep_level falls back to 1, which may also cause OOM in some cases.

Reasoning

During RL training, model weights must be transferred from the training engine (FSDP/Megatron) to the rollout engine (vLLM/SGLang). Without layered_summon, this requires temporarily holding both copies in GPU memory. For huge models, this exceeds available VRAM. Layered summon transfers one layer at a time, overlapping the transfer with computation.

Code Evidence

From trainer/config/ppo_trainer.yaml:55:

# for huge model, layered summon can save memory (prevent OOM) but make it slower
layered_summon: False

And from verl/workers/rollout/vllm_rollout/vllm_rollout.py:90-91:

if config.layered_summon or (config.expert_parallel_size > 1 and not _check_vllm_version_for_sleep_level()):
    logger.warning("Setting the sleep level to 1 may cause a memory overflow.")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment