Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ollama Ollama Model Architecture Support

From Leeroopedia
Knowledge Sources
Domains Model Architecture, Multi-Architecture
Last Updated 2025-02-15 00:00 GMT

Overview

Model Architecture Support encompasses the implementation of over a dozen distinct transformer and non-transformer architectures within Ollama, each with its own attention mechanism, normalization strategy, position encoding, and feed-forward network configuration.

Core Concepts

Architecture Diversity

Modern LLMs exhibit significant architectural variation despite sharing the transformer backbone. Key differences include attention type (multi-head, grouped-query, multi-query, sliding window), normalization placement (pre-norm vs. post-norm, RMSNorm vs. LayerNorm), position encoding (rotary, absolute, ALiBi), and feed-forward structure (standard MLP, gated MLP, mixture-of-experts). Supporting this diversity requires each architecture to define its own forward pass while sharing common infrastructure.

Supported Architecture Families

Ollama supports architectures including but not limited to:

  • Llama family (Llama, Llama 4, Code Llama) - RMSNorm, rotary embeddings, grouped-query attention
  • Gemma family (Gemma 2, Gemma 3, Gemma 3n) - RMSNorm with unique scaling, logit soft-capping
  • Qwen family (Qwen 2, Qwen 2.5 VL, Qwen 3, Qwen 3 Next, Qwen 3 VL) - varying attention configurations, vision-language variants
  • Mistral family (Mistral 3) - sliding window attention
  • DeepSeek family (DeepSeek 2, DeepSeek OCR) - mixture-of-experts with shared experts
  • BERT family (BERT, NomicBERT) - bidirectional attention, encoder-only
  • Others (Olmo 3, GLM 4 MoE Lite, GLM OCR, GPT-OSS, LFM 2) - specialized configurations

Shared Layer Primitives

Despite architectural diversity, many components are shared across architectures. The ml/nn package provides reusable layer primitives such as linear projections, embedding lookups, and normalization operations. Architectures compose these primitives into their specific layer configurations, reducing code duplication while preserving architectural fidelity.

Vision-Language Extensions

Several architectures extend the base text transformer with vision encoders (CLIP, SigLIP) or audio encoders to handle multimodal inputs. These extensions add preprocessing pipelines for non-text modalities and cross-attention or projection layers that inject visual/audio features into the language model's representation space.

Implementation Notes

Each architecture is implemented in its own subdirectory under model/models/ (e.g., model/models/llama/, model/models/gemma3/, model/models/deepseek2/). The central registry in model/models/models.go maps GGUF architecture metadata strings to the corresponding constructor functions. Converter implementations in convert/ provide architecture-specific weight mapping for each supported model family.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment