Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Model Loader Types

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Model Loading
Last Updated 2025-02-15 00:00 GMT

Overview

Header declaring the llama_model_loader struct for reading and parsing GGUF model files.

Description

Defines llama_tensor_weight for tracking individual tensor metadata (source file index, data offset, tensor pointer). Uses weight_name_comparer for layer-aware alphabetical sorting. Declares the loader with methods for construction from file path, metadata key lookup (get_key), architecture detection (get_arch), tensor creation (create_tensor) with flags for optional/duplicate/skip tensors, and data loading (load_all_data) with progress callback support.

Usage

Include this header to access the model loader interface for GGUF file parsing and tensor creation.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-model-loader.h
  • Lines: 1-172

Signature

enum llama_fver {
    GGUF_FILE_VERSION_V1 = 1,
    GGUF_FILE_VERSION_V2 = 2,
    GGUF_FILE_VERSION_V3 = 3,
};

struct llama_model_loader {
    struct llama_tensor_weight {
        uint16_t  idx;
        size_t   offs;
        ggml_tensor * tensor;
    };

    static const int TENSOR_NOT_REQUIRED = 1 << 0;
    static const int TENSOR_DUPLICATED   = 1 << 1;
    static const int TENSOR_SKIP         = 1 << 2;

    llama_model_loader(const std::string & fname, std::vector<std::string> & splits,
        bool use_mmap, bool check_tensors, bool no_alloc,
        const llama_model_kv_override * param_overrides_p,
        const llama_model_tensor_buft_override * param_tensor_buft_overrides_p);

    template<typename T> bool get_key(enum llm_kv kid, T & result, bool required = true);
    struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name,
        const std::initializer_list<int64_t> & ne, int flags = 0);
    bool load_all_data(struct ggml_context * ctx, llama_buf_map & bufs,
        llama_mlocks * lmlocks, llama_progress_callback progress_callback, void * user_data);
};

Import

#include "llama-model-loader.h"

I/O Contract

Inputs

Name Type Required Description
fname const std::string & Yes Path to the primary GGUF file
splits std::vector<std::string> & No Additional split file paths

Outputs

Name Type Description
n_tensors int Total number of tensors in the model
ftype llama_ftype File type / quantization format
arch_name std::string Architecture name string

Usage Examples

#include "llama-model-loader.h"

llama_model_loader ml(path, splits, true, true, false, nullptr, nullptr);
auto arch = ml.get_arch();
ml.print_info();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment