Implementation:Ggml org Llama cpp Model Loader Header
| Knowledge Sources | |
|---|---|
| Domains | Model_Loading |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the `llama_model_loader` struct and its supporting types for reading GGUF model files and managing tensor weight data.
Description
This header defines the core model loading interface used by llama.cpp to deserialize GGUF files into in-memory model representations. It includes the `llama_tensor_weight` struct for tracking source file indices and data offsets, the `weight_name_comparer` for layer-aware sorting of tensor names, and the main `llama_model_loader` struct with template methods for reading typed metadata (`get_key`, `get_arr`, `get_key_or_arr`), tensor lifecycle methods (`create_tensor`, `create_tensor_as_view`, `done_getting_tensors`), and data loading methods (`init_mappings`, `load_all_data`). The loader also supports KV overrides and tensor buffer type overrides via configurable parameters.
Usage
Use this header when implementing model loading from GGUF files, accessing model metadata, or creating tensor objects from serialized weight data. It is the primary interface through which all model weights and metadata flow into the `llama_model` object.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-model-loader.h
- Lines: 1-176
Signature
struct llama_model_loader {
struct llama_tensor_weight { ... };
struct weight_name_comparer { ... };
static const int TENSOR_NOT_REQUIRED = 1 << 0;
static const int TENSOR_DUPLICATED = 1 << 1;
static const int TENSOR_SKIP = 1 << 2;
llama_model_loader(
const std::string & fname,
std::vector<std::string> & splits,
bool use_mmap, bool use_direct_io,
bool check_tensors, bool no_alloc,
const llama_model_kv_override * param_overrides_p,
const llama_model_tensor_buft_override * param_tensor_buft_overrides_p);
template<typename T> bool get_key(const std::string & key, T & result, bool required = true);
template<typename T> bool get_arr(const std::string & key, std::vector<T> & result, bool required = true);
template<typename T, size_t N_MAX> bool get_key_or_arr(const std::string & key, std::array<T, N_MAX> & result, uint32_t n, bool required = true);
struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name, const std::initializer_list<int64_t> & ne, int flags = 0);
struct ggml_tensor * create_tensor_as_view(struct ggml_context * ctx, struct ggml_tensor * base, const std::string & name, const std::initializer_list<int64_t> & ne, size_t offset, bool required = true);
void done_getting_tensors() const;
void init_mappings(bool prefetch = true, llama_mlocks * mlock_mmaps = nullptr);
bool load_all_data(struct ggml_context * ctx, llama_buf_map & bufs, llama_mlocks * lmlocks, llama_progress_callback progress_callback, void * progress_callback_user_data);
};
Import
#include "llama-model-loader.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fname | const std::string & | Yes | Path to the GGUF model file |
| splits | std::vector<std::string> & | Yes | Optional split file paths if not following naming scheme |
| use_mmap | bool | Yes | Whether to use memory-mapped I/O for loading |
| use_direct_io | bool | Yes | Whether to use direct I/O (bypass OS cache) |
| check_tensors | bool | Yes | Whether to validate tensor data integrity |
| no_alloc | bool | Yes | Whether to skip memory allocation for tensor data |
| param_overrides_p | const llama_model_kv_override * | No | Optional KV metadata overrides |
| param_tensor_buft_overrides_p | const llama_model_tensor_buft_override * | No | Optional tensor buffer type overrides |
Outputs
| Name | Type | Description |
|---|---|---|
| llama_model_loader (object) | llama_model_loader | Initialized loader with parsed GGUF metadata, tensor map, and file mappings |
| load_all_data (return) | bool | Returns false if loading was cancelled by the progress callback |
Usage Examples
// Construct a model loader from a GGUF file
std::vector<std::string> splits;
llama_model_loader ml("model.gguf", splits,
/*use_mmap=*/true, /*use_direct_io=*/false,
/*check_tensors=*/true, /*no_alloc=*/false,
nullptr, nullptr);
// Read a metadata key
uint32_t n_embd;
ml.get_key(LLM_KV_EMBEDDING_LENGTH, n_embd);
// Create a tensor from the loaded weights
auto * tensor = ml.create_tensor(ctx, "blk.0.attn_q.weight", {n_embd, n_embd});
// Finalize tensor creation and load data
ml.done_getting_tensors();
ml.init_mappings();
ml.load_all_data(ctx, bufs, nullptr, nullptr, nullptr);