Implementation:Ollama Ollama Llama Arch Header
| Knowledge Sources | |
|---|---|
| Domains | Model Architecture, GGUF |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Header declaring all model architecture enums, GGUF key-value identifiers, tensor name enums, and helper classes for architecture-aware name construction.
Description
Defines llm_arch enum with entries for every supported architecture (LLaMA, Falcon, GPT-2, BERT, Qwen, Gemma, DeepSeek, Mamba, T5, and many more). Defines llm_kv enum for GGUF metadata keys (context length, embedding size, attention heads, rope parameters, tokenizer configuration, etc.) and llm_tensor enum for all tensor types (token embeddings, attention weights, FFN layers, SSM components, etc.). Provides LLM_KV helper that formats architecture-prefixed key strings and LLM_TN / LLM_TN_IMPL for constructing tensor names with layer and expert indices.
Usage
This is the central architecture definition header that every part of llama.cpp's model loading, saving, and graph building depends on. It is the single source of truth for what architectures exist and how their components are named.
Code Reference
Source Location
- Repository: Ollama
- File: llama/llama.cpp/src/llama-arch.h
- Lines: 1-581
Signature
enum llm_arch {
LLM_ARCH_LLAMA, LLM_ARCH_LLAMA4, LLM_ARCH_FALCON,
LLM_ARCH_GPT2, LLM_ARCH_BERT, LLM_ARCH_QWEN2,
LLM_ARCH_GEMMA, LLM_ARCH_DEEPSEEK2, LLM_ARCH_MAMBA,
LLM_ARCH_T5, LLM_ARCH_RWKV7,
// ... 100+ architectures
LLM_ARCH_UNKNOWN,
};
enum llm_kv {
LLM_KV_GENERAL_ARCHITECTURE, LLM_KV_CONTEXT_LENGTH,
LLM_KV_EMBEDDING_LENGTH, LLM_KV_BLOCK_COUNT,
LLM_KV_ATTENTION_HEAD_COUNT, LLM_KV_ROPE_FREQ_BASE,
// ... 200+ keys
};
enum llm_tensor {
LLM_TENSOR_TOKEN_EMBD, LLM_TENSOR_OUTPUT,
LLM_TENSOR_ATTN_Q, LLM_TENSOR_ATTN_K, LLM_TENSOR_ATTN_V,
LLM_TENSOR_FFN_GATE, LLM_TENSOR_FFN_DOWN, LLM_TENSOR_FFN_UP,
// ... 80+ tensor types
};
struct LLM_KV {
LLM_KV(llm_arch arch);
std::string operator()(llm_kv kv) const;
};
struct LLM_TN {
LLM_TN(llm_arch arch);
std::string operator()(llm_tensor tensor, const char * suffix, int bid = -1, int xid = -1) const;
};
Import
#include "llama-arch.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| arch | llm_arch | Yes | Architecture to construct names for |
| kv | llm_kv | Yes | Key-value identifier to look up |
| tensor | llm_tensor | Yes | Tensor type to construct name for |
| bid | int | No | Block/layer index (default: -1 for no layer) |
Outputs
| Name | Type | Description |
|---|---|---|
| key_string | std::string | Architecture-prefixed GGUF key string |
| tensor_name | std::string | Tensor name with layer and expert indices |
Usage Examples
#include "llama-arch.h"
// Check architecture type
if (arch == LLM_ARCH_LLAMA) {
// LLaMA-specific handling
}
// Construct key names
LLM_KV kv(arch);
std::string ctx_key = kv(LLM_KV_CONTEXT_LENGTH);
// Construct tensor names
LLM_TN tn(arch);
std::string q_weight = tn(LLM_TENSOR_ATTN_Q, "weight", /*layer=*/5);