Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ollama Ollama ModelArchitecture

From Leeroopedia
Knowledge Sources
Domains Model Architecture, Registry Pattern
Last Updated 2025-02-15 00:00 GMT

Overview

Model Architecture Registration and Dispatch provides a mechanism for defining, registering, and instantiating architecture-specific model implementations, enabling Ollama to support a wide variety of LLM architectures through a common interface and dynamic dispatch system.

Core Concepts

Architecture Registry

Each model architecture (Llama, Gemma, Qwen, Mistral, etc.) registers itself with a central registry that maps architecture name strings to constructor functions. When a model is loaded, its metadata specifies the architecture name, and the registry dispatches to the corresponding constructor. This pattern follows the classic factory/registry design, allowing new architectures to be added without modifying the core loading infrastructure.

Model Interface

All architecture implementations conform to a common Model interface that defines the forward pass contract. This interface requires methods for processing input tokens through the model's layers and producing output logits. Additional optional interfaces can be implemented for capabilities like multimodal input processing or architecture-specific cache configuration.

Architecture-Specific Layers

While all architectures share the same interface, each defines its own layer structure internally. For instance, a Llama model uses RMSNorm with rotary position embeddings and grouped-query attention, while a BERT model uses LayerNorm with absolute position embeddings and bidirectional attention. The architecture registration system allows each to define its layer stack, attention mechanism, and normalization strategy independently.

Converter Dispatch

The architecture registry also plays a role during model conversion. When importing weights from HuggingFace SafeTensors format, the converter must know the architecture to correctly map tensor names and apply architecture-specific transformations. Each architecture registers a converter alongside its runtime implementation.

Implementation Notes

Architecture implementations live under model/models/ with each architecture in its own subdirectory (e.g., model/models/llama/, model/models/gemma3/, model/models/qwen2/). The registry in model/models/models.go maps architecture strings to constructors. Converter implementations under convert/ follow a parallel registration pattern with files like convert_llama.go, convert_gemma.go, etc.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment