Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Hparams Header

From Leeroopedia
Revision as of 12:39, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Hparams_Header.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Model_Architecture, Configuration
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the `llama_hparams` struct containing all fixed model hyperparameters read from GGUF model files.

Description

This header defines an extensive set of model architecture parameters organized into the `llama_hparams` struct. It covers basic dimensions (n_embd, n_layer, n_head arrays, n_ff arrays), MoE expert configuration, normalization epsilon values, RoPE parameters and scaling, sliding window attention (SWA) configuration with multiple types (standard, chunked, symmetric), SSM/Mamba state parameters, RWKV-specific parameters, MLA (Multi-head Latent Attention) dimensions, per-layer recurrent/attention classification arrays, and architecture-specific parameters for Granite, Gemma3n, and Qwen3. It also defines enums for expert gating functions and SWA types, and uses fixed-size arrays with `LLAMA_MAX_LAYERS` (512) and `LLAMA_MAX_EXPERTS` (512) limits.

Usage

Include this header whenever you need access to model hyperparameters. Nearly every component in the inference pipeline depends on these parameters for tensor allocation, graph construction, and memory management.

Code Reference

Source Location

Signature

#define LLAMA_MAX_LAYERS  512
#define LLAMA_MAX_EXPERTS 512

enum llama_expert_gating_func_type {
    LLAMA_EXPERT_GATING_FUNC_TYPE_NONE,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SIGMOID,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX_WEIGHT,
};

enum llama_swa_type {
    LLAMA_SWA_TYPE_NONE,
    LLAMA_SWA_TYPE_STANDARD,
    LLAMA_SWA_TYPE_CHUNKED,
    LLAMA_SWA_TYPE_SYMMETRIC,
};

struct llama_hparams_posnet { uint32_t n_embd; uint32_t n_layer; };
struct llama_hparams_convnext { uint32_t n_embd; uint32_t n_layer; };

struct llama_hparams {
    bool vocab_only;
    bool rope_finetuned;
    uint32_t n_ctx_train;
    uint32_t n_embd;
    uint32_t n_layer;
    uint32_t n_embd_head_k;
    uint32_t n_embd_head_v;
    uint32_t n_expert;
    uint32_t n_expert_used;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_arr;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_kv_arr;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_ff_arr;
    // ... (many more fields)
};

Import

#pragma once
#include "llama.h"
#include <array>
#include <cassert>

I/O Contract

Inputs

Name Type Required Description
GGUF metadata key-value pairs Yes Model hyperparameters read from GGUF file during model loading

Outputs

Name Type Description
hparams llama_hparams Fully populated hyperparameters struct used throughout the inference pipeline
n_head(il) uint32_t Per-layer attention head count accessor
n_head_kv(il) uint32_t Per-layer key-value head count accessor
n_ff(il) uint32_t Per-layer feed-forward dimension accessor
is_swa(il) bool Whether a given layer uses sliding window attention

Usage Examples

// Access basic hyperparameters
const auto & hparams = model.hparams;
uint32_t n_embd  = hparams.n_embd;
uint32_t n_layer = hparams.n_layer;

// Per-layer parameters
for (int il = 0; il < n_layer; il++) {
    uint32_t n_head    = hparams.n_head_arr[il];
    uint32_t n_head_kv = hparams.n_head_kv_arr[il];
    bool swa_layer     = hparams.is_swa(il);
}

// Check SWA type
if (hparams.swa_type == LLAMA_SWA_TYPE_CHUNKED) {
    // handle chunked sliding window attention
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment