Implementation:Ollama Ollama Llama Hparams

Knowledge Sources	Ollama
Domains	LLM Inference, Model Configuration
Last Updated	2025-02-15 00:00 GMT

Overview

Implements accessor methods and utility functions for model hyperparameters, including per-layer parameter lookup and sliding window attention pattern configuration.

Description

Provides per-layer accessor methods: n_head(il), n_head_kv(il), n_ff(il) that look up values from the per-layer arrays. Implements set_swa_pattern to configure which layers use sliding window attention vs dense attention based on a pattern interval and ordering. Contains n_embd_k_gqa, n_embd_v_gqa for computing grouped-query attention dimensions, n_embd_r/n_embd_s for recurrent state sizes, and n_layer_kv for counting KV cache layers.

Usage

Provides the runtime interface for querying model hyperparameters that vary per layer, common in modern architectures with heterogeneous layer configurations (e.g., different attention head counts or FFN sizes per layer).

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-hparams.cpp
Lines: 1-245

Signature

void llama_hparams::set_swa_pattern(uint32_t n_pattern, bool dense_first);
bool llama_hparams::is_swa_any() const;

uint32_t llama_hparams::n_head(uint32_t il) const;
uint32_t llama_hparams::n_head_kv(uint32_t il) const;
uint32_t llama_hparams::n_ff(uint32_t il) const;
uint32_t llama_hparams::n_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_inp() const;
uint32_t llama_hparams::n_embd_k_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_v_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_k_gqa_max() const;
uint32_t llama_hparams::n_embd_v_gqa_max() const;
uint32_t llama_hparams::n_embd_r() const;
uint32_t llama_hparams::n_embd_s() const;

Import

#include "llama-hparams.h"

I/O Contract

Inputs

Name	Type	Required	Description
il	uint32_t	Yes	Layer index for per-layer queries
n_pattern	uint32_t	No	SWA pattern interval for set_swa_pattern
dense_first	bool	No	Whether dense layers come first in SWA pattern

Outputs

Name	Type	Description
n_head	uint32_t	Number of attention heads for a layer
n_head_kv	uint32_t	Number of KV heads for a layer
n_embd_k_gqa	uint32_t	Key embedding dimension with GQA

Usage Examples

const auto & hp = model.hparams;

// Per-layer queries:
uint32_t heads = hp.n_head(il);
uint32_t kv_heads = hp.n_head_kv(il);
uint32_t ff_size = hp.n_ff(il);

// GQA dimensions:
uint32_t k_dim = hp.n_embd_k_gqa(il);
uint32_t v_dim = hp.n_embd_v_gqa(il);

Related Pages

Principle:Ollama_Ollama_Model_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment