Implementation:Ollama Ollama Llama Model Loader

Knowledge Sources	Ollama
Domains	LLM Inference, Model Loading
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model file loader that reads model metadata, creates tensors, and loads tensor data from GGUF files including split files.

Description

The constructor opens GGUF file(s) using gguf_init_from_file, parses metadata key-value pairs, discovers all tensors and their weights/offsets, and sets up memory mapping. Implements GGUFMeta template helpers for type-safe metadata access. create_tensor creates ggml tensors matching GGUF tensor definitions with architecture-aware name lookup. load_all_data reads tensor data from files, supporting both mmap-based loading and read-based loading with progress callbacks.

Usage

Gateway component for all model loading in llama.cpp. Every model that Ollama serves must pass through this loader, which handles the GGUF file format parsing, tensor discovery, and efficient data loading.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-model-loader.cpp
Lines: 1-1169

Signature

const char * llama_file_version_name(llama_fver version);
static std::string llama_model_ftype_name(llama_ftype ftype);

namespace GGUFMeta {
    template <typename T, gguf_type gt_, T (*gfun)(const gguf_context *, const int64_t)>
    struct GKV_Base_Type { /* ... */ };

    template<typename T>
    class GKV : public GKV_Base<T> { /* ... */ };
}

// llama_model_loader methods:
struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name,
    const std::initializer_list<int64_t> & ne, int flags = 0);
bool load_all_data(struct ggml_context * ctx, llama_buf_map & bufs,
    llama_mlocks * lmlocks, llama_progress_callback progress_callback, void * user_data);

Import

#include "llama-model-loader.h"

I/O Contract

Inputs

Name	Type	Required	Description
fname	const std::string &	Yes	Path to the GGUF model file
use_mmap	bool	Yes	Whether to use memory mapping
check_tensors	bool	Yes	Validate tensor data integrity
no_alloc	bool	Yes	Skip tensor data allocation

Outputs

Name	Type	Description
weights_map	std::map	Map of tensor name to weight metadata
meta	gguf_context_ptr	Parsed GGUF metadata context
ftype	llama_ftype	Model file type (quantization format)

Usage Examples

// Created during llama_model_load_from_file:
llama_model_loader ml(fname, splits, use_mmap, check_tensors, no_alloc,
    param_overrides, tensor_buft_overrides);

// Read hyperparameters:
uint32_t n_embd;
ml.get_key(LLM_KV_EMBEDDING_LENGTH, n_embd);

// Create tensors and load data:
ggml_tensor * tok_embd = ml.create_tensor(ctx, "token_embd.weight", {n_embd, n_vocab});
ml.load_all_data(ctx, bufs, lmlocks, progress_cb, user_data);

Related Pages

Principle:Ollama_Ollama_Model_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment