Heuristic:Huggingface Transformers Dataloader Pin Memory NonBlocking
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Training, Data_Loading |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Enable both dataloader_pin_memory and non_blocking together for optimal CPU-to-GPU data transfer performance.
Description
Pinned (page-locked) memory allows faster CPU-to-GPU data transfers by enabling DMA (Direct Memory Access) instead of going through the CPU page tables. The non_blocking=True flag allows these transfers to happen asynchronously, overlapping data movement with computation. However, non_blocking only provides a benefit when the source tensors are in pinned memory. The Trainer warns when non_blocking is enabled without dataloader_pin_memory.
Usage
Apply this whenever you are training on GPU and want to maximize data loading throughput. This is particularly impactful when your training is I/O bound (data loading is the bottleneck) or when you have large batch sizes.
The Insight (Rule of Thumb)
- Action: Set
dataloader_pin_memory=Truein TrainingArguments (this is the default). If usingnon_blocking=Truein accelerator config, ensure pin_memory is also enabled. - Value:
dataloader_pin_memory=True(default),non_blocking=Truein accelerator config. - Trade-off: Slightly higher CPU memory usage (pinned memory is not swappable). Negligible for most workloads.
Reasoning
When DataLoader pin_memory is enabled, PyTorch pre-allocates page-locked memory for batch tensors, which allows the CUDA driver to copy them to GPU via DMA without CPU involvement. Adding non_blocking allows these transfers to run asynchronously on a CUDA stream, so the next batch can start transferring while the current batch is being processed. Without pin_memory, non_blocking transfers still go through the pageable memory path and provide minimal benefit.
Code Evidence
Warning from src/transformers/trainer.py:731-736:
non_blocking = accelerator_config.pop("non_blocking")
if non_blocking and not self.args.dataloader_pin_memory:
logger.warning(
"`non_blocking` is enabled but `dataloader_pin_memory` is not. For the best performance, it's recommended to enable both."
)