Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Mmap Header

From Leeroopedia
Knowledge Sources
Domains File_IO, Memory_Mapping
Last Updated 2026-02-15 00:00 GMT

Overview

Declares cross-platform abstractions for file I/O, memory-mapped file access, and memory locking used during model loading.

Description

This header defines three structs using the pimpl (pointer-to-implementation) pattern: `llama_file` provides file operations (read, write, seek, tell) with direct I/O and aligned read support; `llama_mmap` wraps memory-mapped file access with prefetching, NUMA awareness, and fragment unmapping; `llama_mlock` pins memory ranges in physical RAM to prevent paging. Type aliases for vectors of unique pointers (`llama_files`, `llama_mmaps`, `llama_mlocks`) and a `llama_path_max()` utility function are also provided.

Usage

Include this header when working with the model loading pipeline. It provides the platform abstraction layer that enables efficient memory-mapped model loading across Windows, Linux, and macOS.

Code Reference

Source Location

Signature

struct llama_file {
    llama_file(const char * fname, const char * mode, bool use_direct_io = false);
    ~llama_file();

    size_t tell() const;
    size_t size() const;
    int file_id() const;
    void seek(size_t offset, int whence) const;
    void read_raw(void * ptr, size_t len);
    void read_raw_unsafe(void * ptr, size_t len);
    void read_aligned_chunk(void * dest, size_t size);
    uint32_t read_u32();
    void write_raw(const void * ptr, size_t len) const;
    void write_u32(uint32_t val) const;
    size_t read_alignment() const;
    bool has_direct_io() const;
};

struct llama_mmap {
    llama_mmap(struct llama_file * file, size_t prefetch = (size_t) -1, bool numa = false);
    ~llama_mmap();

    size_t size() const;
    void * addr() const;
    void unmap_fragment(size_t first, size_t last);
    static const bool SUPPORTED;
};

struct llama_mlock {
    llama_mlock();
    ~llama_mlock();

    void init(void * ptr);
    void grow_to(size_t target_size);
    static const bool SUPPORTED;
};

using llama_files  = std::vector<std::unique_ptr<llama_file>>;
using llama_mmaps  = std::vector<std::unique_ptr<llama_mmap>>;
using llama_mlocks = std::vector<std::unique_ptr<llama_mlock>>;

size_t llama_path_max();

Import

#include "llama-mmap.h"
// Dependencies:
#include <cstdint>
#include <memory>
#include <vector>
#include <cstdio>

I/O Contract

Inputs

Name Type Required Description
fname const char * Yes File path for llama_file constructor
mode const char * Yes File open mode (e.g., "rb", "wb")
use_direct_io bool No Enable direct I/O bypass (default: false)
file llama_file * Yes File handle for llama_mmap constructor
prefetch size_t No Number of bytes to prefetch (default: all)
numa bool No Enable NUMA-aware mapping (default: false)
ptr void * Yes Memory address for llama_mlock::init
target_size size_t Yes Target locked memory size for grow_to

Outputs

Name Type Description
tell() size_t Current file position
size() size_t File or mapping size in bytes
addr() void * Base address of the memory-mapped region
read_u32() uint32_t 32-bit unsigned integer read from file
SUPPORTED bool Whether the platform supports mmap/mlock
llama_path_max() size_t Maximum path length for the platform

Usage Examples

#include "llama-mmap.h"

// Open a model file
llama_file file("model.gguf", "rb");

// Memory-map the file
if (llama_mmap::SUPPORTED) {
    llama_mmap mmap(&file);
    void * data = mmap.addr();
    size_t len = mmap.size();
    // access model data through mapped memory
}

// Lock memory to prevent paging
if (llama_mlock::SUPPORTED) {
    llama_mlock mlock;
    mlock.init(data_ptr);
    mlock.grow_to(data_size);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment