Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Model Loader Header

From Leeroopedia
Knowledge Sources
Domains Model_Loading
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the `llama_model_loader` struct and its supporting types for reading GGUF model files and managing tensor weight data.

Description

This header defines the core model loading interface used by llama.cpp to deserialize GGUF files into in-memory model representations. It includes the `llama_tensor_weight` struct for tracking source file indices and data offsets, the `weight_name_comparer` for layer-aware sorting of tensor names, and the main `llama_model_loader` struct with template methods for reading typed metadata (`get_key`, `get_arr`, `get_key_or_arr`), tensor lifecycle methods (`create_tensor`, `create_tensor_as_view`, `done_getting_tensors`), and data loading methods (`init_mappings`, `load_all_data`). The loader also supports KV overrides and tensor buffer type overrides via configurable parameters.

Usage

Use this header when implementing model loading from GGUF files, accessing model metadata, or creating tensor objects from serialized weight data. It is the primary interface through which all model weights and metadata flow into the `llama_model` object.

Code Reference

Source Location

Signature

struct llama_model_loader {
    struct llama_tensor_weight { ... };
    struct weight_name_comparer { ... };

    static const int TENSOR_NOT_REQUIRED = 1 << 0;
    static const int TENSOR_DUPLICATED   = 1 << 1;
    static const int TENSOR_SKIP         = 1 << 2;

    llama_model_loader(
        const std::string & fname,
        std::vector<std::string> & splits,
        bool use_mmap, bool use_direct_io,
        bool check_tensors, bool no_alloc,
        const llama_model_kv_override * param_overrides_p,
        const llama_model_tensor_buft_override * param_tensor_buft_overrides_p);

    template<typename T> bool get_key(const std::string & key, T & result, bool required = true);
    template<typename T> bool get_arr(const std::string & key, std::vector<T> & result, bool required = true);
    template<typename T, size_t N_MAX> bool get_key_or_arr(const std::string & key, std::array<T, N_MAX> & result, uint32_t n, bool required = true);

    struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name, const std::initializer_list<int64_t> & ne, int flags = 0);
    struct ggml_tensor * create_tensor_as_view(struct ggml_context * ctx, struct ggml_tensor * base, const std::string & name, const std::initializer_list<int64_t> & ne, size_t offset, bool required = true);
    void done_getting_tensors() const;

    void init_mappings(bool prefetch = true, llama_mlocks * mlock_mmaps = nullptr);
    bool load_all_data(struct ggml_context * ctx, llama_buf_map & bufs, llama_mlocks * lmlocks, llama_progress_callback progress_callback, void * progress_callback_user_data);
};

Import

#include "llama-model-loader.h"

I/O Contract

Inputs

Name Type Required Description
fname const std::string & Yes Path to the GGUF model file
splits std::vector<std::string> & Yes Optional split file paths if not following naming scheme
use_mmap bool Yes Whether to use memory-mapped I/O for loading
use_direct_io bool Yes Whether to use direct I/O (bypass OS cache)
check_tensors bool Yes Whether to validate tensor data integrity
no_alloc bool Yes Whether to skip memory allocation for tensor data
param_overrides_p const llama_model_kv_override * No Optional KV metadata overrides
param_tensor_buft_overrides_p const llama_model_tensor_buft_override * No Optional tensor buffer type overrides

Outputs

Name Type Description
llama_model_loader (object) llama_model_loader Initialized loader with parsed GGUF metadata, tensor map, and file mappings
load_all_data (return) bool Returns false if loading was cancelled by the progress callback

Usage Examples

// Construct a model loader from a GGUF file
std::vector<std::string> splits;
llama_model_loader ml("model.gguf", splits,
    /*use_mmap=*/true, /*use_direct_io=*/false,
    /*check_tensors=*/true, /*no_alloc=*/false,
    nullptr, nullptr);

// Read a metadata key
uint32_t n_embd;
ml.get_key(LLM_KV_EMBEDDING_LENGTH, n_embd);

// Create a tensor from the loaded weights
auto * tensor = ml.create_tensor(ctx, "blk.0.attn_q.weight", {n_embd, n_embd});

// Finalize tensor creation and load data
ml.done_getting_tensors();
ml.init_mappings();
ml.load_all_data(ctx, bufs, nullptr, nullptr, nullptr);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment