Principle:LaurentMazare Tch rs Memory Mapped Weight Loading
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Memory_Optimization |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Pattern for loading large model weights using memory-mapped file I/O combined with zero-copy tensor construction to minimize peak memory usage.
Description
Standard weight loading allocates memory for file contents and model parameters separately, causing 2x peak memory usage for large models. Memory-mapped loading avoids this by: (1) memory-mapping the safetensors file, (2) using Tensor::from_blob to create tensor views directly over the mapped memory, and (3) copying into VarStore variables via f_copy_. This approach is essential for loading multi-gigabyte LLM weights on memory-constrained systems.
Usage
Use when loading large model weights (>1GB) where peak memory is a concern. Requires safetensors format and direct access to VarStore internals.
Theoretical Basis
Standard Loading: File → [buffer in RAM] → [VarStore tensors] = 2x model size in RAM
Memory-Mapped: File → [mmap: virtual memory] → from_blob → copy_ → only model size in RAM
Steps:
1. mmap the safetensors file (OS-level virtual memory, not physical allocation)
2. SafeTensors::deserialize(mmap) — parse tensor metadata
3. For each tensor: Tensor::from_blob(data_ptr, shape, strides, kind, device)
→ Creates a tensor VIEW over mmap'd memory (no copy)
4. var.f_copy_(tensor_view) — copies data into VarStore
5. mmap freed after all copies complete