Implementation:Ggml org Llama cpp Mmap

Knowledge Sources	Ggml_org_Llama_cpp
Domains	File_IO, Memory_Mapping
Last Updated	2026-02-15 00:00 GMT

Overview

Implements cross-platform file I/O, memory-mapped file access, and memory locking primitives for efficient model loading.

Description

This file uses the pimpl pattern with platform-specific implementations for Windows (Win32 API with CreateFileMapping/MapViewOfFile) and POSIX (mmap/munmap). `llama_file` wraps file operations with support for direct I/O and aligned reads. `llama_mmap` maps model files into virtual memory with optional NUMA-aware prefetching and supports partial unmapping of fragments as tensors are copied to GPU. `llama_mlock` uses mlock/VirtualLock to pin mapped memory in RAM, preventing page-outs. The file also provides `llama_path_max()` for querying the system path length limit.

Usage

Use this module for zero-copy model loading via memory mapping, which avoids reading entire multi-gigabyte model files into heap memory. The memory locking support prevents performance degradation from page swapping during inference.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-mmap.cpp
Lines: 1-752

Signature

// File I/O wrapper (pimpl pattern)
struct llama_file::impl {
    impl(const char * fname, const char * mode, bool use_direct_io = false);
    // Platform-specific file handle (HANDLE on Windows, FILE * on POSIX)
};

// Memory-mapped file access
struct llama_mmap {
    // Maps file into virtual memory
    // Supports NUMA-aware prefetching
    // Supports partial unmapping of fragments
};

// Memory locking
struct llama_mlock {
    // Pins mapped memory in RAM using mlock/VirtualLock
};

// Utility
size_t llama_path_max();

Import

#include "llama-mmap.h"
#include "llama-impl.h"
#include "ggml.h"
#include <cstring>
#include <climits>
#include <stdexcept>
#include <cerrno>
#include <algorithm>
// Platform-specific: unistd.h, fcntl.h, sys/mman.h, windows.h

I/O Contract

Inputs

Name	Type	Required	Description
fname	const char *	Yes	File path to open or memory-map
mode	const char *	Yes	File open mode (e.g., "rb" for read-binary)
use_direct_io	bool	No	Whether to use O_DIRECT for bypassing OS page cache
prefetch	bool	No	Whether to prefetch mapped pages into memory (NUMA-aware)

Outputs

Name	Type	Description
mapped memory	void *	Pointer to the memory-mapped file contents
file size	size_t	Size of the opened file in bytes
path_max	size_t	System-specific maximum path length

Usage Examples

// Open a file for reading
llama_file file("model.gguf", "rb");

// Memory-map a model file
llama_mmap mmap(&file, /*prefetch=*/true);
const void * data = mmap.addr;
size_t size = mmap.size;

// Lock mapped memory to prevent paging
llama_mlock mlock;
mlock.init(data);
mlock.grow_to(size);

Related Pages

Principle:Ggml_org_Llama_cpp_ModelLoading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment