Heuristic:Volcengine Verl Inplace Operations OOM Prevention
Metadata:
- Sources: Repo|verl|https://github.com/volcengine/verl
- Domains: Optimization, Debugging
- Last Updated: 2026-02-07 17:00 GMT
Overview
Use in-place tensor operations and aggressive cache clearing to prevent OOM during RL training of LLMs.
Description
verl employs several memory optimization strategies: (1) In-place division for logits to avoid allocating temporary tensors; (2) Aggressive cache clearing with retry logic that stops when less than 1GB is freed; (3) Memory fraction management (95% for inference, 90% for training); (4) Reference policy forced to CPU offload to save GPU memory.
Usage
Apply these patterns when encountering OOM errors during training, especially with large batch sizes or long sequences.
The Insight
- Action 1: Use
tensor.div_(value)instead oftensor / valuefor large logit tensors - Action 2: Call
aggressive_empty_cache()between training and rollout phases - Action 3: Force reference policy to CPUOffload
- Trade-off: In-place ops prevent gradient computation through that path. CPU offload adds transfer latency.
Reasoning
LLM logit tensors are massive (batch × seq_len × vocab_size). Creating a temporary copy during division can trigger OOM. In-place operations modify the tensor directly. The aggressive cache clearing retries up to 3 times but stops early if less than 1GB is freed (diminishing returns). Reference policy does not need gradients, so CPU offload is free from a correctness perspective.
Code Evidence
From verl/utils/torch_functional.py:674:
logits = logits.div_(temperature) # inplace operation to avoid OOM
And from verl/utils/memory_utils.py:72-74:
# Stop retrying if little memory was freed
if reserved_freed < 1024**3: # less than 1GB
break
And from verl/workers/fsdp_workers.py:522:
# We force reference policy to use CPUOffload to save memory.