Implementation:Ggml org Llama cpp Mmap
| Knowledge Sources | |
|---|---|
| Domains | File_IO, Memory_Mapping |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Implements cross-platform file I/O, memory-mapped file access, and memory locking primitives for efficient model loading.
Description
This file uses the pimpl pattern with platform-specific implementations for Windows (Win32 API with CreateFileMapping/MapViewOfFile) and POSIX (mmap/munmap). `llama_file` wraps file operations with support for direct I/O and aligned reads. `llama_mmap` maps model files into virtual memory with optional NUMA-aware prefetching and supports partial unmapping of fragments as tensors are copied to GPU. `llama_mlock` uses mlock/VirtualLock to pin mapped memory in RAM, preventing page-outs. The file also provides `llama_path_max()` for querying the system path length limit.
Usage
Use this module for zero-copy model loading via memory mapping, which avoids reading entire multi-gigabyte model files into heap memory. The memory locking support prevents performance degradation from page swapping during inference.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-mmap.cpp
- Lines: 1-752
Signature
// File I/O wrapper (pimpl pattern)
struct llama_file::impl {
impl(const char * fname, const char * mode, bool use_direct_io = false);
// Platform-specific file handle (HANDLE on Windows, FILE * on POSIX)
};
// Memory-mapped file access
struct llama_mmap {
// Maps file into virtual memory
// Supports NUMA-aware prefetching
// Supports partial unmapping of fragments
};
// Memory locking
struct llama_mlock {
// Pins mapped memory in RAM using mlock/VirtualLock
};
// Utility
size_t llama_path_max();
Import
#include "llama-mmap.h"
#include "llama-impl.h"
#include "ggml.h"
#include <cstring>
#include <climits>
#include <stdexcept>
#include <cerrno>
#include <algorithm>
// Platform-specific: unistd.h, fcntl.h, sys/mman.h, windows.h
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fname | const char * | Yes | File path to open or memory-map |
| mode | const char * | Yes | File open mode (e.g., "rb" for read-binary) |
| use_direct_io | bool | No | Whether to use O_DIRECT for bypassing OS page cache |
| prefetch | bool | No | Whether to prefetch mapped pages into memory (NUMA-aware) |
Outputs
| Name | Type | Description |
|---|---|---|
| mapped memory | void * | Pointer to the memory-mapped file contents |
| file size | size_t | Size of the opened file in bytes |
| path_max | size_t | System-specific maximum path length |
Usage Examples
// Open a file for reading
llama_file file("model.gguf", "rb");
// Memory-map a model file
llama_mmap mmap(&file, /*prefetch=*/true);
const void * data = mmap.addr;
size_t size = mmap.size;
// Lock mapped memory to prevent paging
llama_mlock mlock;
mlock.init(data);
mlock.grow_to(size);