Implementation:Ollama Ollama Llama Hparams
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Configuration |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements accessor methods and utility functions for model hyperparameters, including per-layer parameter lookup and sliding window attention pattern configuration.
Description
Provides per-layer accessor methods: n_head(il), n_head_kv(il), n_ff(il) that look up values from the per-layer arrays. Implements set_swa_pattern to configure which layers use sliding window attention vs dense attention based on a pattern interval and ordering. Contains n_embd_k_gqa, n_embd_v_gqa for computing grouped-query attention dimensions, n_embd_r/n_embd_s for recurrent state sizes, and n_layer_kv for counting KV cache layers.
Usage
Provides the runtime interface for querying model hyperparameters that vary per layer, common in modern architectures with heterogeneous layer configurations (e.g., different attention head counts or FFN sizes per layer).
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-hparams.cpp - Lines: 1-245
Signature
void llama_hparams::set_swa_pattern(uint32_t n_pattern, bool dense_first);
bool llama_hparams::is_swa_any() const;
uint32_t llama_hparams::n_head(uint32_t il) const;
uint32_t llama_hparams::n_head_kv(uint32_t il) const;
uint32_t llama_hparams::n_ff(uint32_t il) const;
uint32_t llama_hparams::n_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_inp() const;
uint32_t llama_hparams::n_embd_k_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_v_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_k_gqa_max() const;
uint32_t llama_hparams::n_embd_v_gqa_max() const;
uint32_t llama_hparams::n_embd_r() const;
uint32_t llama_hparams::n_embd_s() const;
Import
#include "llama-hparams.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| il | uint32_t | Yes | Layer index for per-layer queries |
| n_pattern | uint32_t | No | SWA pattern interval for set_swa_pattern |
| dense_first | bool | No | Whether dense layers come first in SWA pattern |
Outputs
| Name | Type | Description |
|---|---|---|
| n_head | uint32_t | Number of attention heads for a layer |
| n_head_kv | uint32_t | Number of KV heads for a layer |
| n_embd_k_gqa | uint32_t | Key embedding dimension with GQA |
Usage Examples
const auto & hp = model.hparams;
// Per-layer queries:
uint32_t heads = hp.n_head(il);
uint32_t kv_heads = hp.n_head_kv(il);
uint32_t ff_size = hp.n_ff(il);
// GQA dimensions:
uint32_t k_dim = hp.n_embd_k_gqa(il);
uint32_t v_dim = hp.n_embd_v_gqa(il);