Heuristic:OpenRLHF OpenRLHF Adam Offload Memory Tip
| Knowledge Sources | |
|---|---|
| Domains | Optimization, LLMs, Distributed_Training |
| Last Updated | 2026-02-07 10:00 GMT |
Overview
Use `--adam_offload` to move optimizer states to CPU, freeing GPU VRAM at the cost of training speed.
Description
Adam optimizer maintains two state tensors per parameter (first and second moments), effectively tripling the memory footprint compared to the model weights alone. With `--adam_offload`, OpenRLHF switches from `FusedAdam` (GPU-resident, fast) to `DeepSpeedCPUAdam` (CPU-resident, slower), moving these optimizer states to system RAM. This frees substantial GPU VRAM, enabling training of larger models or larger batch sizes. When adam_offload is active, additional state offloading via `offload_deepspeed_states()` is automatically skipped since the states are already on CPU.
Usage
Use this heuristic when you are VRAM constrained during training, especially with large models (7B+). It pairs well with gradient checkpointing for maximum memory savings. Disable this (do not pass `--adam_offload`) when you have sufficient GPU memory and want maximum training speed.
The Insight (Rule of Thumb)
- Action: Add `--adam_offload` to the training command.
- Value: Moves ~2x model parameter memory from GPU to CPU RAM.
- Trade-off: Slower training due to CPU-GPU data transfer for optimizer steps. FusedAdam is significantly faster than DeepSpeedCPUAdam.
- Interaction: When adam_offload is active, additional state offloading is redundant and automatically skipped.
Reasoning
For a 7B parameter model in bf16, the model weights occupy ~14GB VRAM. Adam's momentum and variance tensors add another ~28GB in fp32. Offloading these to CPU frees the ~28GB of VRAM, allowing the freed memory to be used for larger batches, gradient accumulation, or activation storage. The trade-off is additional PCIe bandwidth usage for CPU-GPU transfer during each optimizer step.
Code evidence from `openrlhf/utils/deepspeed/deepspeed.py:138`:
AdamOptimizer = DeepSpeedCPUAdam if self.adam_offload else FusedAdam
DeepSpeed config from `openrlhf/utils/deepspeed/deepspeed_utils.py:24-26`:
"offload_optimizer": {
"device": "cpu" if adam_offload else "none",
"pin_memory": True,
},
Auto-skip of redundant state offloading from `openrlhf/utils/deepspeed/deepspeed_utils.py:147-149`:
# state offloading not required when using Adam optimizer offloading
if adam_offload:
return