Implementation:Ollama Ollama Llama Model Loader
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Loading |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model file loader that reads model metadata, creates tensors, and loads tensor data from GGUF files including split files.
Description
The constructor opens GGUF file(s) using gguf_init_from_file, parses metadata key-value pairs, discovers all tensors and their weights/offsets, and sets up memory mapping. Implements GGUFMeta template helpers for type-safe metadata access. create_tensor creates ggml tensors matching GGUF tensor definitions with architecture-aware name lookup. load_all_data reads tensor data from files, supporting both mmap-based loading and read-based loading with progress callbacks.
Usage
Gateway component for all model loading in llama.cpp. Every model that Ollama serves must pass through this loader, which handles the GGUF file format parsing, tensor discovery, and efficient data loading.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-model-loader.cpp - Lines: 1-1169
Signature
const char * llama_file_version_name(llama_fver version);
static std::string llama_model_ftype_name(llama_ftype ftype);
namespace GGUFMeta {
template <typename T, gguf_type gt_, T (*gfun)(const gguf_context *, const int64_t)>
struct GKV_Base_Type { /* ... */ };
template<typename T>
class GKV : public GKV_Base<T> { /* ... */ };
}
// llama_model_loader methods:
struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name,
const std::initializer_list<int64_t> & ne, int flags = 0);
bool load_all_data(struct ggml_context * ctx, llama_buf_map & bufs,
llama_mlocks * lmlocks, llama_progress_callback progress_callback, void * user_data);
Import
#include "llama-model-loader.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fname | const std::string & | Yes | Path to the GGUF model file |
| use_mmap | bool | Yes | Whether to use memory mapping |
| check_tensors | bool | Yes | Validate tensor data integrity |
| no_alloc | bool | Yes | Skip tensor data allocation |
Outputs
| Name | Type | Description |
|---|---|---|
| weights_map | std::map | Map of tensor name to weight metadata |
| meta | gguf_context_ptr | Parsed GGUF metadata context |
| ftype | llama_ftype | Model file type (quantization format) |
Usage Examples
// Created during llama_model_load_from_file:
llama_model_loader ml(fname, splits, use_mmap, check_tensors, no_alloc,
param_overrides, tensor_buft_overrides);
// Read hyperparameters:
uint32_t n_embd;
ml.get_key(LLM_KV_EMBEDDING_LENGTH, n_embd);
// Create tensors and load data:
ggml_tensor * tok_embd = ml.create_tensor(ctx, "token_embd.weight", {n_embd, n_vocab});
ml.load_all_data(ctx, bufs, lmlocks, progress_cb, user_data);