Implementation:Ollama Ollama Llama Arch Header

Knowledge Sources	Ollama
Domains	Model Architecture, GGUF
Last Updated	2025-02-15 00:00 GMT

Overview

Header declaring all model architecture enums, GGUF key-value identifiers, tensor name enums, and helper classes for architecture-aware name construction.

Description

Defines llm_arch enum with entries for every supported architecture (LLaMA, Falcon, GPT-2, BERT, Qwen, Gemma, DeepSeek, Mamba, T5, and many more). Defines llm_kv enum for GGUF metadata keys (context length, embedding size, attention heads, rope parameters, tokenizer configuration, etc.) and llm_tensor enum for all tensor types (token embeddings, attention weights, FFN layers, SSM components, etc.). Provides LLM_KV helper that formats architecture-prefixed key strings and LLM_TN / LLM_TN_IMPL for constructing tensor names with layer and expert indices.

Usage

This is the central architecture definition header that every part of llama.cpp's model loading, saving, and graph building depends on. It is the single source of truth for what architectures exist and how their components are named.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-arch.h
Lines: 1-581

Signature

enum llm_arch {
    LLM_ARCH_LLAMA, LLM_ARCH_LLAMA4, LLM_ARCH_FALCON,
    LLM_ARCH_GPT2, LLM_ARCH_BERT, LLM_ARCH_QWEN2,
    LLM_ARCH_GEMMA, LLM_ARCH_DEEPSEEK2, LLM_ARCH_MAMBA,
    LLM_ARCH_T5, LLM_ARCH_RWKV7,
    // ... 100+ architectures
    LLM_ARCH_UNKNOWN,
};

enum llm_kv {
    LLM_KV_GENERAL_ARCHITECTURE, LLM_KV_CONTEXT_LENGTH,
    LLM_KV_EMBEDDING_LENGTH, LLM_KV_BLOCK_COUNT,
    LLM_KV_ATTENTION_HEAD_COUNT, LLM_KV_ROPE_FREQ_BASE,
    // ... 200+ keys
};

enum llm_tensor {
    LLM_TENSOR_TOKEN_EMBD, LLM_TENSOR_OUTPUT,
    LLM_TENSOR_ATTN_Q, LLM_TENSOR_ATTN_K, LLM_TENSOR_ATTN_V,
    LLM_TENSOR_FFN_GATE, LLM_TENSOR_FFN_DOWN, LLM_TENSOR_FFN_UP,
    // ... 80+ tensor types
};

struct LLM_KV {
    LLM_KV(llm_arch arch);
    std::string operator()(llm_kv kv) const;
};

struct LLM_TN {
    LLM_TN(llm_arch arch);
    std::string operator()(llm_tensor tensor, const char * suffix, int bid = -1, int xid = -1) const;
};

Import

#include "llama-arch.h"

I/O Contract

Inputs

Name	Type	Required	Description
arch	llm_arch	Yes	Architecture to construct names for
kv	llm_kv	Yes	Key-value identifier to look up
tensor	llm_tensor	Yes	Tensor type to construct name for
bid	int	No	Block/layer index (default: -1 for no layer)

Outputs

Name	Type	Description
key_string	std::string	Architecture-prefixed GGUF key string
tensor_name	std::string	Tensor name with layer and expert indices

Usage Examples

#include "llama-arch.h"

// Check architecture type
if (arch == LLM_ARCH_LLAMA) {
    // LLaMA-specific handling
}

// Construct key names
LLM_KV kv(arch);
std::string ctx_key = kv(LLM_KV_CONTEXT_LENGTH);

// Construct tensor names
LLM_TN tn(arch);
std::string q_weight = tn(LLM_TENSOR_ATTN_Q, "weight", /*layer=*/5);

Related Pages

Principle:Ollama_Ollama_Model_Architecture_Support

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment