Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Mmap

From Leeroopedia
Knowledge Sources
Domains File_IO, Memory_Mapping
Last Updated 2026-02-15 00:00 GMT

Overview

Implements cross-platform file I/O, memory-mapped file access, and memory locking primitives for efficient model loading.

Description

This file uses the pimpl pattern with platform-specific implementations for Windows (Win32 API with CreateFileMapping/MapViewOfFile) and POSIX (mmap/munmap). `llama_file` wraps file operations with support for direct I/O and aligned reads. `llama_mmap` maps model files into virtual memory with optional NUMA-aware prefetching and supports partial unmapping of fragments as tensors are copied to GPU. `llama_mlock` uses mlock/VirtualLock to pin mapped memory in RAM, preventing page-outs. The file also provides `llama_path_max()` for querying the system path length limit.

Usage

Use this module for zero-copy model loading via memory mapping, which avoids reading entire multi-gigabyte model files into heap memory. The memory locking support prevents performance degradation from page swapping during inference.

Code Reference

Source Location

Signature

// File I/O wrapper (pimpl pattern)
struct llama_file::impl {
    impl(const char * fname, const char * mode, bool use_direct_io = false);
    // Platform-specific file handle (HANDLE on Windows, FILE * on POSIX)
};

// Memory-mapped file access
struct llama_mmap {
    // Maps file into virtual memory
    // Supports NUMA-aware prefetching
    // Supports partial unmapping of fragments
};

// Memory locking
struct llama_mlock {
    // Pins mapped memory in RAM using mlock/VirtualLock
};

// Utility
size_t llama_path_max();

Import

#include "llama-mmap.h"
#include "llama-impl.h"
#include "ggml.h"
#include <cstring>
#include <climits>
#include <stdexcept>
#include <cerrno>
#include <algorithm>
// Platform-specific: unistd.h, fcntl.h, sys/mman.h, windows.h

I/O Contract

Inputs

Name Type Required Description
fname const char * Yes File path to open or memory-map
mode const char * Yes File open mode (e.g., "rb" for read-binary)
use_direct_io bool No Whether to use O_DIRECT for bypassing OS page cache
prefetch bool No Whether to prefetch mapped pages into memory (NUMA-aware)

Outputs

Name Type Description
mapped memory void * Pointer to the memory-mapped file contents
file size size_t Size of the opened file in bytes
path_max size_t System-specific maximum path length

Usage Examples

// Open a file for reading
llama_file file("model.gguf", "rb");

// Memory-map a model file
llama_mmap mmap(&file, /*prefetch=*/true);
const void * data = mmap.addr;
size_t size = mmap.size;

// Lock mapped memory to prevent paging
llama_mlock mlock;
mlock.init(data);
mlock.grow_to(size);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment