Principle:Ollama Ollama ModelArchitecture
| Knowledge Sources | |
|---|---|
| Domains | Model Architecture, Registry Pattern |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Model Architecture Registration and Dispatch provides a mechanism for defining, registering, and instantiating architecture-specific model implementations, enabling Ollama to support a wide variety of LLM architectures through a common interface and dynamic dispatch system.
Core Concepts
Architecture Registry
Each model architecture (Llama, Gemma, Qwen, Mistral, etc.) registers itself with a central registry that maps architecture name strings to constructor functions. When a model is loaded, its metadata specifies the architecture name, and the registry dispatches to the corresponding constructor. This pattern follows the classic factory/registry design, allowing new architectures to be added without modifying the core loading infrastructure.
Model Interface
All architecture implementations conform to a common Model interface that defines the forward pass contract. This interface requires methods for processing input tokens through the model's layers and producing output logits. Additional optional interfaces can be implemented for capabilities like multimodal input processing or architecture-specific cache configuration.
Architecture-Specific Layers
While all architectures share the same interface, each defines its own layer structure internally. For instance, a Llama model uses RMSNorm with rotary position embeddings and grouped-query attention, while a BERT model uses LayerNorm with absolute position embeddings and bidirectional attention. The architecture registration system allows each to define its layer stack, attention mechanism, and normalization strategy independently.
Converter Dispatch
The architecture registry also plays a role during model conversion. When importing weights from HuggingFace SafeTensors format, the converter must know the architecture to correctly map tensor names and apply architecture-specific transformations. Each architecture registers a converter alongside its runtime implementation.
Implementation Notes
Architecture implementations live under model/models/ with each architecture in its own subdirectory (e.g., model/models/llama/, model/models/gemma3/, model/models/qwen2/). The registry in model/models/models.go maps architecture strings to constructors. Converter implementations under convert/ follow a parallel registration pattern with files like convert_llama.go, convert_gemma.go, etc.