Implementation:Ollama Ollama Llama Mmap
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, File I/O |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements cross-platform file I/O, memory-mapped file access, and memory locking for efficient model loading.
Description
Contains platform-specific implementations for Windows, POSIX, and macOS. llama_file::impl wraps file operations (open, read, write, seek, tell) using either Win32 API or POSIX stdio. llama_mmap::impl implements memory-mapped file access via mmap/CreateFileMapping with optional prefetching and NUMA-aware allocation. llama_mlock::impl uses mlock/VirtualLock to pin pages in RAM. Each class uses the pimpl pattern for platform abstraction.
Usage
Used during model loading to efficiently map large model files into memory without copying. Memory locking prevents the OS from paging model weights to swap, which is essential for fast inference with large models.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-mmap.cpp - Lines: 1-600
Signature
struct llama_file::impl {
impl(const char * fname, const char * mode);
size_t tell() const;
void seek(size_t offset, int whence) const;
void read_raw(void * ptr, size_t len) const;
uint32_t read_u32() const;
void write_raw(const void * ptr, size_t len) const;
void write_u32(uint32_t val) const;
FILE * fp;
size_t size;
};
Import
#include "llama-mmap.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fname | const char * | Yes | File path to open or memory-map |
| mode | const char * | Yes | File open mode (e.g., "rb") |
| prefetch | bool | No | Whether to prefetch mmap pages |
Outputs
| Name | Type | Description |
|---|---|---|
| addr | void* | Memory-mapped file address |
| size | size_t | File size in bytes |
Usage Examples
// File reading:
llama_file file("model.gguf", "rb");
uint32_t magic = file.read_u32();
// Memory mapping is handled internally by llama_model_loader
// Memory locking is applied to keep model weights in RAM