Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Mmap

From Leeroopedia
Knowledge Sources
Domains LLM Inference, File I/O
Last Updated 2025-02-15 00:00 GMT

Overview

Implements cross-platform file I/O, memory-mapped file access, and memory locking for efficient model loading.

Description

Contains platform-specific implementations for Windows, POSIX, and macOS. llama_file::impl wraps file operations (open, read, write, seek, tell) using either Win32 API or POSIX stdio. llama_mmap::impl implements memory-mapped file access via mmap/CreateFileMapping with optional prefetching and NUMA-aware allocation. llama_mlock::impl uses mlock/VirtualLock to pin pages in RAM. Each class uses the pimpl pattern for platform abstraction.

Usage

Used during model loading to efficiently map large model files into memory without copying. Memory locking prevents the OS from paging model weights to swap, which is essential for fast inference with large models.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-mmap.cpp
  • Lines: 1-600

Signature

struct llama_file::impl {
    impl(const char * fname, const char * mode);
    size_t tell() const;
    void seek(size_t offset, int whence) const;
    void read_raw(void * ptr, size_t len) const;
    uint32_t read_u32() const;
    void write_raw(const void * ptr, size_t len) const;
    void write_u32(uint32_t val) const;
    FILE * fp;
    size_t size;
};

Import

#include "llama-mmap.h"

I/O Contract

Inputs

Name Type Required Description
fname const char * Yes File path to open or memory-map
mode const char * Yes File open mode (e.g., "rb")
prefetch bool No Whether to prefetch mmap pages

Outputs

Name Type Description
addr void* Memory-mapped file address
size size_t File size in bytes

Usage Examples

// File reading:
llama_file file("model.gguf", "rb");
uint32_t magic = file.read_u32();

// Memory mapping is handled internally by llama_model_loader
// Memory locking is applied to keep model weights in RAM

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment