Principle:LaurentMazare Tch rs Memory Mapped Weight Loading

Knowledge Sources	tch-rs
Domains	Deep_Learning, Memory_Optimization
Last Updated	2026-02-08 14:00 GMT

Overview

Pattern for loading large model weights using memory-mapped file I/O combined with zero-copy tensor construction to minimize peak memory usage.

Description

Standard weight loading allocates memory for file contents and model parameters separately, causing 2x peak memory usage for large models. Memory-mapped loading avoids this by: (1) memory-mapping the safetensors file, (2) using Tensor::from_blob to create tensor views directly over the mapped memory, and (3) copying into VarStore variables via f_copy_. This approach is essential for loading multi-gigabyte LLM weights on memory-constrained systems.

Usage

Use when loading large model weights (>1GB) where peak memory is a concern. Requires safetensors format and direct access to VarStore internals.

Theoretical Basis

Standard Loading:     File → [buffer in RAM] → [VarStore tensors] = 2x model size in RAM
Memory-Mapped:        File → [mmap: virtual memory] → from_blob → copy_ → only model size in RAM

Steps:
  1. mmap the safetensors file (OS-level virtual memory, not physical allocation)
  2. SafeTensors::deserialize(mmap) — parse tensor metadata
  3. For each tensor: Tensor::from_blob(data_ptr, shape, strides, kind, device)
     → Creates a tensor VIEW over mmap'd memory (no copy)
  4. var.f_copy_(tensor_view) — copies data into VarStore
  5. mmap freed after all copies complete

Related Pages

Implemented By

Implementation:LaurentMazare_Tch_rs_Mmap_Safetensors_Load

Uses Heuristic

Heuristic:LaurentMazare_Tch_rs_Safetensors_Format_Preference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment