Implementation:Ggml org Llama cpp Arch Header
| Knowledge Sources | |
|---|---|
| Domains | Model_Architecture |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares enumerations and helper types for model architectures, GGUF metadata keys, and tensor identifiers used throughout the llama.cpp codebase.
Description
Defines the `llm_arch` enum (100+ architecture variants including LLaMA, Falcon, GPT-2, Qwen, Gemma, Mamba, RWKV, and many more), `llm_kv` enum (metadata keys like context length, embedding dimensions, expert counts), and `llm_tensor` enum (tensor names for attention, FFN, SSM, and other components). Also provides the `LLM_KV` helper struct for constructing architecture-prefixed GGUF key strings, and the `LLM_TN` / `LLM_TN_IMPL` helpers for constructing layer-indexed tensor name strings (e.g., "blk.3.attn_norm.weight").
Usage
This is a core header imported by virtually every other source file in the project. Use these enums and helpers to identify model components in a type-safe manner when working with GGUF files and model tensors.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-arch.h
- Lines: 1-614
Signature
// Architecture enumeration (100+ variants)
enum llm_arch {
LLM_ARCH_CLIP,
LLM_ARCH_LLAMA,
LLM_ARCH_LLAMA4,
LLM_ARCH_FALCON,
LLM_ARCH_GPT2,
LLM_ARCH_QWEN2,
LLM_ARCH_GEMMA,
LLM_ARCH_MAMBA,
// ... 100+ more
LLM_ARCH_UNKNOWN,
};
// Metadata key enumeration
enum llm_kv {
LLM_KV_GENERAL_TYPE,
LLM_KV_GENERAL_ARCHITECTURE,
LLM_KV_GENERAL_NAME,
LLM_KV_CONTEXT_LENGTH,
LLM_KV_EMBEDDING_LENGTH,
LLM_KV_BLOCK_COUNT,
// ... many more
};
// Tensor name enumeration
enum llm_tensor {
LLM_TENSOR_TOKEN_EMBD,
LLM_TENSOR_OUTPUT_NORM,
LLM_TENSOR_OUTPUT,
LLM_TENSOR_ATTN_NORM,
LLM_TENSOR_ATTN_Q,
LLM_TENSOR_ATTN_K,
LLM_TENSOR_ATTN_V,
LLM_TENSOR_ATTN_OUT,
LLM_TENSOR_FFN_GATE,
LLM_TENSOR_FFN_DOWN,
LLM_TENSOR_FFN_UP,
// ... many more
};
// Helper for constructing architecture-prefixed GGUF keys
struct LLM_KV {
LLM_KV(llm_arch arch);
std::string operator()(llm_kv kv) const;
};
// Helper for constructing layer-indexed tensor names
struct LLM_TN_IMPL { ... };
struct LLM_TN {
LLM_TN(llm_arch arch);
LLM_TN_IMPL operator()(llm_tensor tensor, const char * suffix = nullptr, int bid = -1, int xid = -1) const;
};
// Tensor metadata
struct llm_tensor_info {
llm_tensor_layer layer;
ggml_op op;
};
Import
#pragma once
#include "ggml.h"
#include <string>
#include <set>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| arch | llm_arch | Yes | Architecture enum value for constructing prefixed keys or tensor names |
| kv | llm_kv | Yes | Metadata key enum value (for LLM_KV helper) |
| tensor | llm_tensor | Yes | Tensor name enum value (for LLM_TN helper) |
| bid | int | No | Block/layer index for layer-specific tensor names (default: -1 for non-layered) |
| xid | int | No | Expert index for MoE tensor names (default: -1 for non-expert) |
Outputs
| Name | Type | Description |
|---|---|---|
| key_string | std::string | Architecture-prefixed GGUF metadata key string (e.g., "llama.context_length") |
| tensor_name | std::string | Layer-indexed tensor name string (e.g., "blk.3.attn_norm.weight") |
Usage Examples
// Construct architecture-prefixed GGUF key
LLM_KV kv(LLM_ARCH_LLAMA);
std::string ctx_key = kv(LLM_KV_CONTEXT_LENGTH);
// Result: "llama.context_length"
// Construct layer-indexed tensor name
LLM_TN tn(LLM_ARCH_LLAMA);
auto name = tn(LLM_TENSOR_ATTN_Q, "weight", 3);
// Result: "blk.3.attn_q.weight"
// Check if an architecture enum is known
if (arch != LLM_ARCH_UNKNOWN) {
// Valid architecture found in GGUF file
}