Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Hparams Types

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Model Configuration
Last Updated 2025-02-15 00:00 GMT

Overview

Header declaring the llama_hparams struct which holds all model hyperparameters read from GGUF metadata.

Description

Defines llama_hparams with fields for: training context size, embedding dimensions, layer count, rotary dimension, per-layer head counts (n_head_arr, n_head_kv_arr, n_ff_arr), MoE expert configuration, MLA compressed dimensions for DeepSeek2, RoPE parameters (type, frequency, YaRN), sliding window attention settings, recurrent state parameters (SSM dimensions, convolution kernel), and special architecture-specific parameters. Constants LLAMA_MAX_LAYERS (512) and LLAMA_MAX_EXPERTS (512) set upper bounds.

Usage

Fundamental data structure referenced throughout inference for determining tensor shapes, attention patterns, and memory requirements. Read during model loading.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-hparams.h
  • Lines: 1-282

Signature

#define LLAMA_MAX_LAYERS  512
#define LLAMA_MAX_EXPERTS 512

enum llama_expert_gating_func_type {
    LLAMA_EXPERT_GATING_FUNC_TYPE_NONE = 0,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX = 1,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SIGMOID = 2,
    LLAMA_EXPERT_GATING_FUNC_TYPE_SOFTMAX_WEIGHT = 3,
};

enum llama_swa_type {
    LLAMA_SWA_TYPE_NONE = 0,
    LLAMA_SWA_TYPE_STANDARD = 1,
    LLAMA_SWA_TYPE_CHUNKED = 2,
    LLAMA_SWA_TYPE_SYMMETRIC = 3,
};

struct llama_hparams {
    bool vocab_only;
    uint32_t n_ctx_train;
    uint32_t n_embd;
    uint32_t n_layer;
    uint32_t n_rot;
    uint32_t n_embd_head_k;
    uint32_t n_embd_head_v;
    uint32_t n_expert = 0;
    uint32_t n_expert_used = 0;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_arr;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_head_kv_arr;
    std::array<uint32_t, LLAMA_MAX_LAYERS> n_ff_arr;
    // ... many more fields
};

Import

#include "llama-hparams.h"

I/O Contract

Inputs

Name Type Required Description
N/A N/A N/A Populated during model loading from GGUF metadata

Outputs

Name Type Description
n_embd uint32_t Model embedding dimension
n_layer uint32_t Number of transformer layers
n_head_arr array Per-layer attention head counts
n_expert uint32_t Number of MoE experts

Usage Examples

#include "llama-hparams.h"

const llama_hparams & hp = model.hparams;
uint32_t n_embd = hp.n_embd;
uint32_t n_layer = hp.n_layer;
bool has_swa = hp.is_swa_any();
bool is_moe = hp.n_expert > 0;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment