Implementation:Ollama Ollama Llama Model Header

Knowledge Sources	Ollama
Domains	LLM Inference, Model Loading
Last Updated	2025-02-15 00:00 GMT

Overview

Header declaring the llama_model struct, layer structures, and model size type enumeration for all supported LLM architectures.

Description

Defines llm_type enum with entries for every supported model size (14M through 671B+ and MoE types). Declares layer structures: llama_layer with tensor pointers for all possible layer components (attention Q/K/V/O, FFN gate/up/down, normalization, MoE expert weights, SSM states, etc.), plus specialized structures for PosNet, ConvNext, ShortConv, and NextN layers. llama_model aggregates architecture, hyperparameters, vocabulary, layers vector, embeddings, memory mappings, and backend devices.

Usage

Include this header when working with the model structure, accessing layer tensors, or implementing new architectures.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-model.h
Lines: 1-536

Signature

enum llm_type {
    LLM_TYPE_UNKNOWN,
    LLM_TYPE_14M, LLM_TYPE_17M, /* ... */ LLM_TYPE_671B,
    LLM_TYPE_8x7B, LLM_TYPE_8x22B, /* ... MoE types */
};

struct llama_layer {
    struct ggml_tensor * attn_norm       = nullptr;
    struct ggml_tensor * wq              = nullptr;
    struct ggml_tensor * wk              = nullptr;
    struct ggml_tensor * wv              = nullptr;
    struct ggml_tensor * wo              = nullptr;
    struct ggml_tensor * ffn_gate        = nullptr;
    struct ggml_tensor * ffn_down        = nullptr;
    struct ggml_tensor * ffn_up          = nullptr;
    // ... many more tensor pointers for all architectures
};

struct llama_model {
    llama_hparams hparams;
    llama_vocab   vocab;
    std::vector<llama_layer> layers;
    struct ggml_tensor * tok_embd  = nullptr;
    struct ggml_tensor * output    = nullptr;
    // ...
};

Import

#include "llama-model.h"

I/O Contract

Inputs

Name	Type	Required	Description
N/A	N/A	N/A	Header defines types only; populated during model loading

Outputs

Name	Type	Description
llama_model	struct	Complete model with hparams, vocab, layers, and tensors
llama_layer	struct	Per-layer tensor pointers for all architecture types

Usage Examples

#include "llama-model.h"

// Access model data:
const auto & hparams = model.hparams;
const auto & layer = model.layers[il];
ggml_tensor * q = layer.wq;
ggml_tensor * k = layer.wk;

Related Pages

Principle:Ollama_Ollama_Model_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment