Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Model Loader

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Model Loading
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model file loader that reads model metadata, creates tensors, and loads tensor data from GGUF files including split files.

Description

The constructor opens GGUF file(s) using gguf_init_from_file, parses metadata key-value pairs, discovers all tensors and their weights/offsets, and sets up memory mapping. Implements GGUFMeta template helpers for type-safe metadata access. create_tensor creates ggml tensors matching GGUF tensor definitions with architecture-aware name lookup. load_all_data reads tensor data from files, supporting both mmap-based loading and read-based loading with progress callbacks.

Usage

Gateway component for all model loading in llama.cpp. Every model that Ollama serves must pass through this loader, which handles the GGUF file format parsing, tensor discovery, and efficient data loading.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-model-loader.cpp
  • Lines: 1-1169

Signature

const char * llama_file_version_name(llama_fver version);
static std::string llama_model_ftype_name(llama_ftype ftype);

namespace GGUFMeta {
    template <typename T, gguf_type gt_, T (*gfun)(const gguf_context *, const int64_t)>
    struct GKV_Base_Type { /* ... */ };

    template<typename T>
    class GKV : public GKV_Base<T> { /* ... */ };
}

// llama_model_loader methods:
struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name,
    const std::initializer_list<int64_t> & ne, int flags = 0);
bool load_all_data(struct ggml_context * ctx, llama_buf_map & bufs,
    llama_mlocks * lmlocks, llama_progress_callback progress_callback, void * user_data);

Import

#include "llama-model-loader.h"

I/O Contract

Inputs

Name Type Required Description
fname const std::string & Yes Path to the GGUF model file
use_mmap bool Yes Whether to use memory mapping
check_tensors bool Yes Validate tensor data integrity
no_alloc bool Yes Skip tensor data allocation

Outputs

Name Type Description
weights_map std::map Map of tensor name to weight metadata
meta gguf_context_ptr Parsed GGUF metadata context
ftype llama_ftype Model file type (quantization format)

Usage Examples

// Created during llama_model_load_from_file:
llama_model_loader ml(fname, splits, use_mmap, check_tensors, no_alloc,
    param_overrides, tensor_buft_overrides);

// Read hyperparameters:
uint32_t n_embd;
ml.get_key(LLM_KV_EMBEDDING_LENGTH, n_embd);

// Create tensors and load data:
ggml_tensor * tok_embd = ml.create_tensor(ctx, "token_embd.weight", {n_embd, n_vocab});
ml.load_all_data(ctx, bufs, lmlocks, progress_cb, user_data);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment