Implementation:Ollama Ollama Llama Mmap

Knowledge Sources	Ollama
Domains	LLM Inference, File I/O
Last Updated	2025-02-15 00:00 GMT

Overview

Implements cross-platform file I/O, memory-mapped file access, and memory locking for efficient model loading.

Description

Contains platform-specific implementations for Windows, POSIX, and macOS. llama_file::impl wraps file operations (open, read, write, seek, tell) using either Win32 API or POSIX stdio. llama_mmap::impl implements memory-mapped file access via mmap/CreateFileMapping with optional prefetching and NUMA-aware allocation. llama_mlock::impl uses mlock/VirtualLock to pin pages in RAM. Each class uses the pimpl pattern for platform abstraction.

Usage

Used during model loading to efficiently map large model files into memory without copying. Memory locking prevents the OS from paging model weights to swap, which is essential for fast inference with large models.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-mmap.cpp
Lines: 1-600

Signature

struct llama_file::impl {
    impl(const char * fname, const char * mode);
    size_t tell() const;
    void seek(size_t offset, int whence) const;
    void read_raw(void * ptr, size_t len) const;
    uint32_t read_u32() const;
    void write_raw(const void * ptr, size_t len) const;
    void write_u32(uint32_t val) const;
    FILE * fp;
    size_t size;
};

Import

#include "llama-mmap.h"

I/O Contract

Inputs

Name	Type	Required	Description
fname	const char *	Yes	File path to open or memory-map
mode	const char *	Yes	File open mode (e.g., "rb")
prefetch	bool	No	Whether to prefetch mmap pages

Outputs

Name	Type	Description
addr	void*	Memory-mapped file address
size	size_t	File size in bytes

Usage Examples

// File reading:
llama_file file("model.gguf", "rb");
uint32_t magic = file.read_u32();

// Memory mapping is handled internally by llama_model_loader
// Memory locking is applied to keep model weights in RAM

Related Pages

Principle:Ollama_Ollama_Model_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment