Implementation:Ollama Ollama Llama Hparams Types
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Configuration |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Header declaring the llama_hparams struct which holds all model hyperparameters read from GGUF metadata.
Description
Defines llama_hparams with fields for: training context size, embedding dimensions, layer count, rotary dimension, per-layer head counts (n_head_arr, n_head_kv_arr, n_ff_arr), MoE expert configuration, MLA compressed dimensions for DeepSeek2, RoPE parameters (type, frequency, YaRN), sliding window attention settings, recurrent state parameters (SSM dimensions, convolution kernel), and special architecture-specific parameters. Constants LLAMA_MAX_LAYERS (512) and LLAMA_MAX_EXPERTS (512) set upper bounds.
Usage
Fundamental data structure referenced throughout inference for determining tensor shapes, attention patterns, and memory requirements. Read during model loading.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-hparams.h - Lines: 1-282
Signature
#define LLAMA_MAX_LAYERS 512
#define LLAMA_MAX_EXPERTS 512
enum llama_expert_gating_func_type {
LLAMA_EXPERT_GATING_FUNC_TYPE_NONE = 0,
LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX = 1,
LLAMA_EXPERT_GATING_FUNC_TYPE_SIGMOID = 2,
LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX_WEIGHT = 3,
};
enum llama_swa_type {
LLAMA_SWA_TYPE_NONE = 0,
LLAMA_SWA_TYPE_STANDARD = 1,
LLAMA_SWA_TYPE_CHUNKED = 2,
LLAMA_SWA_TYPE_SYMMETRIC = 3,
};
struct llama_hparams {
bool vocab_only;
uint32_t n_ctx_train;
uint32_t n_embd;
uint32_t n_layer;
uint32_t n_rot;
uint32_t n_embd_head_k;
uint32_t n_embd_head_v;
uint32_t n_expert = 0;
uint32_t n_expert_used = 0;
std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_arr;
std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_kv_arr;
std::array<uint32_t, LLAMA_MAX_LAYERS> n_ff_arr;
// ... many more fields
};
Import
#include "llama-hparams.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| N/A | N/A | N/A | Populated during model loading from GGUF metadata |
Outputs
| Name | Type | Description |
|---|---|---|
| n_embd | uint32_t | Model embedding dimension |
| n_layer | uint32_t | Number of transformer layers |
| n_head_arr | array | Per-layer attention head counts |
| n_expert | uint32_t | Number of MoE experts |
Usage Examples
#include "llama-hparams.h"
const llama_hparams & hp = model.hparams;
uint32_t n_embd = hp.n_embd;
uint32_t n_layer = hp.n_layer;
bool has_swa = hp.is_swa_any();
bool is_moe = hp.n_expert > 0;