Implementation:Ollama Ollama Llama Hparams Types

Knowledge Sources	Ollama
Domains	LLM Inference, Model Configuration
Last Updated	2025-02-15 00:00 GMT

Overview

Header declaring the llama_hparams struct which holds all model hyperparameters read from GGUF metadata.

Description

Defines llama_hparams with fields for: training context size, embedding dimensions, layer count, rotary dimension, per-layer head counts (n_head_arr, n_head_kv_arr, n_ff_arr), MoE expert configuration, MLA compressed dimensions for DeepSeek2, RoPE parameters (type, frequency, YaRN), sliding window attention settings, recurrent state parameters (SSM dimensions, convolution kernel), and special architecture-specific parameters. Constants LLAMA_MAX_LAYERS (512) and LLAMA_MAX_EXPERTS (512) set upper bounds.

Usage

Fundamental data structure referenced throughout inference for determining tensor shapes, attention patterns, and memory requirements. Read during model loading.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-hparams.h
Lines: 1-282

Signature

#define LLAMA_MAX_LAYERS  512
#define LLAMA_MAX_EXPERTS 512

enum llama_expert_gating_func_type {
    LLAMA_EXPERT_GATING_FUNC_TYPE_NONE = 0,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX = 1,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SIGMOID = 2,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX_WEIGHT = 3,
};

enum llama_swa_type {
    LLAMA_SWA_TYPE_NONE = 0,
    LLAMA_SWA_TYPE_STANDARD = 1,
    LLAMA_SWA_TYPE_CHUNKED = 2,
    LLAMA_SWA_TYPE_SYMMETRIC = 3,
};

struct llama_hparams {
    bool vocab_only;
    uint32_t n_ctx_train;
    uint32_t n_embd;
    uint32_t n_layer;
    uint32_t n_rot;
    uint32_t n_embd_head_k;
    uint32_t n_embd_head_v;
    uint32_t n_expert = 0;
    uint32_t n_expert_used = 0;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_arr;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_kv_arr;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_ff_arr;
    // ... many more fields
};

Import

#include "llama-hparams.h"

I/O Contract

Inputs

Name	Type	Required	Description
N/A	N/A	N/A	Populated during model loading from GGUF metadata

Outputs

Name	Type	Description
n_embd	uint32_t	Model embedding dimension
n_layer	uint32_t	Number of transformer layers
n_head_arr	array	Per-layer attention head counts
n_expert	uint32_t	Number of MoE experts

Usage Examples

#include "llama-hparams.h"

const llama_hparams & hp = model.hparams;
uint32_t n_embd = hp.n_embd;
uint32_t n_layer = hp.n_layer;
bool has_swa = hp.is_swa_any();
bool is_moe = hp.n_expert > 0;

Related Pages

Principle:Ollama_Ollama_Model_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment