Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Huggingface Transformers Dataloader Pin Memory NonBlocking

From Leeroopedia
Knowledge Sources
Domains Optimization, Training, Data_Loading
Last Updated 2026-02-13 20:00 GMT

Overview

Enable both dataloader_pin_memory and non_blocking together for optimal CPU-to-GPU data transfer performance.

Description

Pinned (page-locked) memory allows faster CPU-to-GPU data transfers by enabling DMA (Direct Memory Access) instead of going through the CPU page tables. The non_blocking=True flag allows these transfers to happen asynchronously, overlapping data movement with computation. However, non_blocking only provides a benefit when the source tensors are in pinned memory. The Trainer warns when non_blocking is enabled without dataloader_pin_memory.

Usage

Apply this whenever you are training on GPU and want to maximize data loading throughput. This is particularly impactful when your training is I/O bound (data loading is the bottleneck) or when you have large batch sizes.

The Insight (Rule of Thumb)

  • Action: Set dataloader_pin_memory=True in TrainingArguments (this is the default). If using non_blocking=True in accelerator config, ensure pin_memory is also enabled.
  • Value: dataloader_pin_memory=True (default), non_blocking=True in accelerator config.
  • Trade-off: Slightly higher CPU memory usage (pinned memory is not swappable). Negligible for most workloads.

Reasoning

When DataLoader pin_memory is enabled, PyTorch pre-allocates page-locked memory for batch tensors, which allows the CUDA driver to copy them to GPU via DMA without CPU involvement. Adding non_blocking allows these transfers to run asynchronously on a CUDA stream, so the next batch can start transferring while the current batch is being processed. Without pin_memory, non_blocking transfers still go through the pageable memory path and provide minimal benefit.

Code Evidence

Warning from src/transformers/trainer.py:731-736:

non_blocking = accelerator_config.pop("non_blocking")

if non_blocking and not self.args.dataloader_pin_memory:
    logger.warning(
        "`non_blocking` is enabled but `dataloader_pin_memory` is not. For the best performance, it's recommended to enable both."
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment