Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Hparams

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Model Configuration
Last Updated 2025-02-15 00:00 GMT

Overview

Implements accessor methods and utility functions for model hyperparameters, including per-layer parameter lookup and sliding window attention pattern configuration.

Description

Provides per-layer accessor methods: n_head(il), n_head_kv(il), n_ff(il) that look up values from the per-layer arrays. Implements set_swa_pattern to configure which layers use sliding window attention vs dense attention based on a pattern interval and ordering. Contains n_embd_k_gqa, n_embd_v_gqa for computing grouped-query attention dimensions, n_embd_r/n_embd_s for recurrent state sizes, and n_layer_kv for counting KV cache layers.

Usage

Provides the runtime interface for querying model hyperparameters that vary per layer, common in modern architectures with heterogeneous layer configurations (e.g., different attention head counts or FFN sizes per layer).

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-hparams.cpp
  • Lines: 1-245

Signature

void llama_hparams::set_swa_pattern(uint32_t n_pattern, bool dense_first);
bool llama_hparams::is_swa_any() const;

uint32_t llama_hparams::n_head(uint32_t il) const;
uint32_t llama_hparams::n_head_kv(uint32_t il) const;
uint32_t llama_hparams::n_ff(uint32_t il) const;
uint32_t llama_hparams::n_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_inp() const;
uint32_t llama_hparams::n_embd_k_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_v_gqa(uint32_t il) const;
uint32_t llama_hparams::n_embd_k_gqa_max() const;
uint32_t llama_hparams::n_embd_v_gqa_max() const;
uint32_t llama_hparams::n_embd_r() const;
uint32_t llama_hparams::n_embd_s() const;

Import

#include "llama-hparams.h"

I/O Contract

Inputs

Name Type Required Description
il uint32_t Yes Layer index for per-layer queries
n_pattern uint32_t No SWA pattern interval for set_swa_pattern
dense_first bool No Whether dense layers come first in SWA pattern

Outputs

Name Type Description
n_head uint32_t Number of attention heads for a layer
n_head_kv uint32_t Number of KV heads for a layer
n_embd_k_gqa uint32_t Key embedding dimension with GQA

Usage Examples

const auto & hp = model.hparams;

// Per-layer queries:
uint32_t heads = hp.n_head(il);
uint32_t kv_heads = hp.n_head_kv(il);
uint32_t ff_size = hp.n_ff(il);

// GQA dimensions:
uint32_t k_dim = hp.n_embd_k_gqa(il);
uint32_t v_dim = hp.n_embd_v_gqa(il);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment