Principle:Ollama Ollama Model Architecture Support
| Knowledge Sources | |
|---|---|
| Domains | Model Architecture, Multi-Architecture |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Model Architecture Support encompasses the implementation of over a dozen distinct transformer and non-transformer architectures within Ollama, each with its own attention mechanism, normalization strategy, position encoding, and feed-forward network configuration.
Core Concepts
Architecture Diversity
Modern LLMs exhibit significant architectural variation despite sharing the transformer backbone. Key differences include attention type (multi-head, grouped-query, multi-query, sliding window), normalization placement (pre-norm vs. post-norm, RMSNorm vs. LayerNorm), position encoding (rotary, absolute, ALiBi), and feed-forward structure (standard MLP, gated MLP, mixture-of-experts). Supporting this diversity requires each architecture to define its own forward pass while sharing common infrastructure.
Supported Architecture Families
Ollama supports architectures including but not limited to:
- Llama family (Llama, Llama 4, Code Llama) - RMSNorm, rotary embeddings, grouped-query attention
- Gemma family (Gemma 2, Gemma 3, Gemma 3n) - RMSNorm with unique scaling, logit soft-capping
- Qwen family (Qwen 2, Qwen 2.5 VL, Qwen 3, Qwen 3 Next, Qwen 3 VL) - varying attention configurations, vision-language variants
- Mistral family (Mistral 3) - sliding window attention
- DeepSeek family (DeepSeek 2, DeepSeek OCR) - mixture-of-experts with shared experts
- BERT family (BERT, NomicBERT) - bidirectional attention, encoder-only
- Others (Olmo 3, GLM 4 MoE Lite, GLM OCR, GPT-OSS, LFM 2) - specialized configurations
Despite architectural diversity, many components are shared across architectures. The ml/nn package provides reusable layer primitives such as linear projections, embedding lookups, and normalization operations. Architectures compose these primitives into their specific layer configurations, reducing code duplication while preserving architectural fidelity.
Vision-Language Extensions
Several architectures extend the base text transformer with vision encoders (CLIP, SigLIP) or audio encoders to handle multimodal inputs. These extensions add preprocessing pipelines for non-text modalities and cross-attention or projection layers that inject visual/audio features into the language model's representation space.
Implementation Notes
Each architecture is implemented in its own subdirectory under model/models/ (e.g., model/models/llama/, model/models/gemma3/, model/models/deepseek2/). The central registry in model/models/models.go maps GGUF architecture metadata strings to the corresponding constructor functions. Converter implementations in convert/ provide architecture-specific weight mapping for each supported model family.