Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Volcengine Verl Inplace Operations OOM Prevention

From Leeroopedia




Metadata:

Overview

Use in-place tensor operations and aggressive cache clearing to prevent OOM during RL training of LLMs.

Description

verl employs several memory optimization strategies: (1) In-place division for logits to avoid allocating temporary tensors; (2) Aggressive cache clearing with retry logic that stops when less than 1GB is freed; (3) Memory fraction management (95% for inference, 90% for training); (4) Reference policy forced to CPU offload to save GPU memory.

Usage

Apply these patterns when encountering OOM errors during training, especially with large batch sizes or long sequences.

The Insight

  • Action 1: Use tensor.div_(value) instead of tensor / value for large logit tensors
  • Action 2: Call aggressive_empty_cache() between training and rollout phases
  • Action 3: Force reference policy to CPUOffload
  • Trade-off: In-place ops prevent gradient computation through that path. CPU offload adds transfer latency.

Reasoning

LLM logit tensors are massive (batch × seq_len × vocab_size). Creating a temporary copy during division can trigger OOM. In-place operations modify the tensor directly. The aggressive cache clearing retries up to 3 times but stops early if less than 1GB is freed (diminishing returns). Reference policy does not need gradients, so CPU offload is free from a correctness perspective.

Code Evidence

From verl/utils/torch_functional.py:674:

logits = logits.div_(temperature)  # inplace operation to avoid OOM

And from verl/utils/memory_utils.py:72-74:

# Stop retrying if little memory was freed
if reserved_freed < 1024**3:  # less than 1GB
    break

And from verl/workers/fsdp_workers.py:522:

# We force reference policy to use CPUOffload to save memory.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment