Implementation:Ggml org Llama cpp Mmap Header
| Knowledge Sources | |
|---|---|
| Domains | File_IO, Memory_Mapping |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares cross-platform abstractions for file I/O, memory-mapped file access, and memory locking used during model loading.
Description
This header defines three structs using the pimpl (pointer-to-implementation) pattern: `llama_file` provides file operations (read, write, seek, tell) with direct I/O and aligned read support; `llama_mmap` wraps memory-mapped file access with prefetching, NUMA awareness, and fragment unmapping; `llama_mlock` pins memory ranges in physical RAM to prevent paging. Type aliases for vectors of unique pointers (`llama_files`, `llama_mmaps`, `llama_mlocks`) and a `llama_path_max()` utility function are also provided.
Usage
Include this header when working with the model loading pipeline. It provides the platform abstraction layer that enables efficient memory-mapped model loading across Windows, Linux, and macOS.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-mmap.h
- Lines: 1-73
Signature
struct llama_file {
llama_file(const char * fname, const char * mode, bool use_direct_io = false);
~llama_file();
size_t tell() const;
size_t size() const;
int file_id() const;
void seek(size_t offset, int whence) const;
void read_raw(void * ptr, size_t len);
void read_raw_unsafe(void * ptr, size_t len);
void read_aligned_chunk(void * dest, size_t size);
uint32_t read_u32();
void write_raw(const void * ptr, size_t len) const;
void write_u32(uint32_t val) const;
size_t read_alignment() const;
bool has_direct_io() const;
};
struct llama_mmap {
llama_mmap(struct llama_file * file, size_t prefetch = (size_t) -1, bool numa = false);
~llama_mmap();
size_t size() const;
void * addr() const;
void unmap_fragment(size_t first, size_t last);
static const bool SUPPORTED;
};
struct llama_mlock {
llama_mlock();
~llama_mlock();
void init(void * ptr);
void grow_to(size_t target_size);
static const bool SUPPORTED;
};
using llama_files = std::vector<std::unique_ptr<llama_file>>;
using llama_mmaps = std::vector<std::unique_ptr<llama_mmap>>;
using llama_mlocks = std::vector<std::unique_ptr<llama_mlock>>;
size_t llama_path_max();
Import
#include "llama-mmap.h"
// Dependencies:
#include <cstdint>
#include <memory>
#include <vector>
#include <cstdio>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fname | const char * | Yes | File path for llama_file constructor |
| mode | const char * | Yes | File open mode (e.g., "rb", "wb") |
| use_direct_io | bool | No | Enable direct I/O bypass (default: false) |
| file | llama_file * | Yes | File handle for llama_mmap constructor |
| prefetch | size_t | No | Number of bytes to prefetch (default: all) |
| numa | bool | No | Enable NUMA-aware mapping (default: false) |
| ptr | void * | Yes | Memory address for llama_mlock::init |
| target_size | size_t | Yes | Target locked memory size for grow_to |
Outputs
| Name | Type | Description |
|---|---|---|
| tell() | size_t | Current file position |
| size() | size_t | File or mapping size in bytes |
| addr() | void * | Base address of the memory-mapped region |
| read_u32() | uint32_t | 32-bit unsigned integer read from file |
| SUPPORTED | bool | Whether the platform supports mmap/mlock |
| llama_path_max() | size_t | Maximum path length for the platform |
Usage Examples
#include "llama-mmap.h"
// Open a model file
llama_file file("model.gguf", "rb");
// Memory-map the file
if (llama_mmap::SUPPORTED) {
llama_mmap mmap(&file);
void * data = mmap.addr();
size_t len = mmap.size();
// access model data through mapped memory
}
// Lock memory to prevent paging
if (llama_mlock::SUPPORTED) {
llama_mlock mlock;
mlock.init(data_ptr);
mlock.grow_to(data_size);
}