Heuristic:Volcengine Verl Inplace Operations OOM Prevention

Metadata:

Sources: Repo|verl|https://github.com/volcengine/verl
Domains: Optimization, Debugging
Last Updated: 2026-02-07 17:00 GMT

Overview

Use in-place tensor operations and aggressive cache clearing to prevent OOM during RL training of LLMs.

Description

verl employs several memory optimization strategies: (1) In-place division for logits to avoid allocating temporary tensors; (2) Aggressive cache clearing with retry logic that stops when less than 1GB is freed; (3) Memory fraction management (95% for inference, 90% for training); (4) Reference policy forced to CPU offload to save GPU memory.

Usage

Apply these patterns when encountering OOM errors during training, especially with large batch sizes or long sequences.

The Insight

Action 1: Use tensor.div_(value) instead of tensor / value for large logit tensors
Action 2: Call aggressive_empty_cache() between training and rollout phases
Action 3: Force reference policy to CPUOffload
Trade-off: In-place ops prevent gradient computation through that path. CPU offload adds transfer latency.

Reasoning

LLM logit tensors are massive (batch × seq_len × vocab_size). Creating a temporary copy during division can trigger OOM. In-place operations modify the tensor directly. The aggressive cache clearing retries up to 3 times but stops early if less than 1GB is freed (diminishing returns). Reference policy does not need gradients, so CPU offload is free from a correctness perspective.

Code Evidence

From verl/utils/torch_functional.py:674:

logits = logits.div_(temperature)  # inplace operation to avoid OOM

And from verl/utils/memory_utils.py:72-74:

# Stop retrying if little memory was freed
if reserved_freed < 1024**3:  # less than 1GB
    break

And from verl/workers/fsdp_workers.py:522:

# We force reference policy to use CPUOffload to save memory.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment