Implementation:Ollama Ollama Llama Models Registry

Knowledge Sources	Ollama
Domains	LLM Inference, Model Architecture
Last Updated	2025-02-15 00:00 GMT

Overview

Central header file declaring all model architecture graph builder structs for llama.cpp, serving as the registry of all supported LLM architectures.

Description

Declares over 100 llm_build_* structs, each inheriting from llm_graph_context (or specialized bases like llm_graph_context_mamba, llm_build_rwkv6_base, llm_build_rwkv7_base). Each struct has a constructor taking a model and graph parameters, which builds the ggml computation graph for that specific architecture. Also defines shared base classes for Mamba SSM layers, RWKV6, and RWKV7 recurrent models with their specialized graph-building methods.

Usage

Every new model architecture added to the inference engine requires a declaration in this header, making it the core component of the model dispatch system.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/models/models.h
Lines: 1-549

Signature

struct llm_graph_context_mamba : public llm_graph_context {
    llm_graph_context_mamba(const llm_graph_params & params);
    ggml_tensor * build_mamba_layer(llm_graph_input_rs * inp, ggml_tensor * cur, const llama_model & model, const llama_ubatch & ubatch, int il);
    ggml_tensor * build_mamba2_layer(llm_graph_input_rs * inp, ggml_tensor * cur, const llama_model & model, const llama_ubatch & ubatch, int il) const;
};

struct llm_build_rwkv6_base : public llm_graph_context {
    ggml_tensor * build_rwkv6_channel_mix(const llama_layer * layer, ggml_tensor * cur, ggml_tensor * x_prev, llm_arch arch) const;
    ggml_tensor * build_rwkv6_time_mix(llm_graph_input_rs * inp, ggml_tensor * cur, ggml_tensor * x_prev, const llama_ubatch & ubatch, int il) const;
};

// Over 100 architecture builders:
struct llm_build_llama : public llm_graph_context { /* ... */ };
struct llm_build_gemma : public llm_graph_context { /* ... */ };
struct llm_build_qwen2 : public llm_graph_context { /* ... */ };
struct llm_build_deepseek2 : public llm_graph_context { /* ... */ };
// ... etc.

Import

#include "models.h"

I/O Contract

Inputs

Name	Type	Required	Description
model	const llama_model &	Yes	The loaded model with tensors and hyperparameters
params	const llm_graph_params &	Yes	Graph construction parameters

Outputs

Name	Type	Description
ggml graph	ggml_cgraph	Complete computation graph for the architecture

Usage Examples

#include "models.h"

// Graph builders are invoked through llama_model::build_graph()
// which dispatches to the correct architecture:
auto graph_builder = llm_build_llama(model, params);
// The constructor builds the full graph

Related Pages

Principle:Ollama_Ollama_LLM_Inference_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment