Implementation:Ollama Ollama Llama Model Loader Types
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Loading |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Header declaring the llama_model_loader struct for reading and parsing GGUF model files.
Description
Defines llama_tensor_weight for tracking individual tensor metadata (source file index, data offset, tensor pointer). Uses weight_name_comparer for layer-aware alphabetical sorting. Declares the loader with methods for construction from file path, metadata key lookup (get_key), architecture detection (get_arch), tensor creation (create_tensor) with flags for optional/duplicate/skip tensors, and data loading (load_all_data) with progress callback support.
Usage
Include this header to access the model loader interface for GGUF file parsing and tensor creation.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-model-loader.h - Lines: 1-172
Signature
enum llama_fver {
GGUF_FILE_VERSION_V1 = 1,
GGUF_FILE_VERSION_V2 = 2,
GGUF_FILE_VERSION_V3 = 3,
};
struct llama_model_loader {
struct llama_tensor_weight {
uint16_t idx;
size_t offs;
ggml_tensor * tensor;
};
static const int TENSOR_NOT_REQUIRED = 1 << 0;
static const int TENSOR_DUPLICATED = 1 << 1;
static const int TENSOR_SKIP = 1 << 2;
llama_model_loader(const std::string & fname, std::vector<std::string> & splits,
bool use_mmap, bool check_tensors, bool no_alloc,
const llama_model_kv_override * param_overrides_p,
const llama_model_tensor_buft_override * param_tensor_buft_overrides_p);
template<typename T> bool get_key(enum llm_kv kid, T & result, bool required = true);
struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name,
const std::initializer_list<int64_t> & ne, int flags = 0);
bool load_all_data(struct ggml_context * ctx, llama_buf_map & bufs,
llama_mlocks * lmlocks, llama_progress_callback progress_callback, void * user_data);
};
Import
#include "llama-model-loader.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fname | const std::string & | Yes | Path to the primary GGUF file |
| splits | std::vector<std::string> & | No | Additional split file paths |
Outputs
| Name | Type | Description |
|---|---|---|
| n_tensors | int | Total number of tensors in the model |
| ftype | llama_ftype | File type / quantization format |
| arch_name | std::string | Architecture name string |
Usage Examples
#include "llama-model-loader.h"
llama_model_loader ml(path, splits, true, true, false, nullptr, nullptr);
auto arch = ml.get_arch();
ml.print_info();