Implementation:Ggml org Llama cpp Mmap Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	File_IO, Memory_Mapping
Last Updated	2026-02-15 00:00 GMT

Overview

Declares cross-platform abstractions for file I/O, memory-mapped file access, and memory locking used during model loading.

Description

This header defines three structs using the pimpl (pointer-to-implementation) pattern: `llama_file` provides file operations (read, write, seek, tell) with direct I/O and aligned read support; `llama_mmap` wraps memory-mapped file access with prefetching, NUMA awareness, and fragment unmapping; `llama_mlock` pins memory ranges in physical RAM to prevent paging. Type aliases for vectors of unique pointers (`llama_files`, `llama_mmaps`, `llama_mlocks`) and a `llama_path_max()` utility function are also provided.

Usage

Include this header when working with the model loading pipeline. It provides the platform abstraction layer that enables efficient memory-mapped model loading across Windows, Linux, and macOS.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-mmap.h
Lines: 1-73

Signature

struct llama_file {
    llama_file(const char * fname, const char * mode, bool use_direct_io = false);
    ~llama_file();

    size_t tell() const;
    size_t size() const;
    int file_id() const;
    void seek(size_t offset, int whence) const;
    void read_raw(void * ptr, size_t len);
    void read_raw_unsafe(void * ptr, size_t len);
    void read_aligned_chunk(void * dest, size_t size);
    uint32_t read_u32();
    void write_raw(const void * ptr, size_t len) const;
    void write_u32(uint32_t val) const;
    size_t read_alignment() const;
    bool has_direct_io() const;
};

struct llama_mmap {
    llama_mmap(struct llama_file * file, size_t prefetch = (size_t) -1, bool numa = false);
    ~llama_mmap();

    size_t size() const;
    void * addr() const;
    void unmap_fragment(size_t first, size_t last);
    static const bool SUPPORTED;
};

struct llama_mlock {
    llama_mlock();
    ~llama_mlock();

    void init(void * ptr);
    void grow_to(size_t target_size);
    static const bool SUPPORTED;
};

using llama_files  = std::vector<std::unique_ptr<llama_file>>;
using llama_mmaps  = std::vector<std::unique_ptr<llama_mmap>>;
using llama_mlocks = std::vector<std::unique_ptr<llama_mlock>>;

size_t llama_path_max();

Import

#include "llama-mmap.h"
// Dependencies:
#include <cstdint>
#include <memory>
#include <vector>
#include <cstdio>

I/O Contract

Inputs

Name	Type	Required	Description
fname	const char *	Yes	File path for llama_file constructor
mode	const char *	Yes	File open mode (e.g., "rb", "wb")
use_direct_io	bool	No	Enable direct I/O bypass (default: false)
file	llama_file *	Yes	File handle for llama_mmap constructor
prefetch	size_t	No	Number of bytes to prefetch (default: all)
numa	bool	No	Enable NUMA-aware mapping (default: false)
ptr	void *	Yes	Memory address for llama_mlock::init
target_size	size_t	Yes	Target locked memory size for grow_to

Outputs

Name	Type	Description
tell()	size_t	Current file position
size()	size_t	File or mapping size in bytes
addr()	void *	Base address of the memory-mapped region
read_u32()	uint32_t	32-bit unsigned integer read from file
SUPPORTED	bool	Whether the platform supports mmap/mlock
llama_path_max()	size_t	Maximum path length for the platform

Usage Examples

#include "llama-mmap.h"

// Open a model file
llama_file file("model.gguf", "rb");

// Memory-map the file
if (llama_mmap::SUPPORTED) {
    llama_mmap mmap(&file);
    void * data = mmap.addr();
    size_t len = mmap.size();
    // access model data through mapped memory
}

// Lock memory to prevent paging
if (llama_mlock::SUPPORTED) {
    llama_mlock mlock;
    mlock.init(data_ptr);
    mlock.grow_to(data_size);
}

Related Pages

Principle:Ggml_org_Llama_cpp_ModelLoading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment