Implementation:Ollama Ollama Llama Model Arctic
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Architecture |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the ggml computation graph builder for the Snowflake Arctic Mixture-of-Experts architecture.
Description
The llm_build_arctic constructor builds a transformer with RoPE-based positional encoding, RMS-normalized self-attention with standard Q/K/V projections, and MoE-based feed-forward layers using build_moe_ffn across all transformer blocks. The Arctic architecture uses a dense-then-sparse MoE design where each layer has both a dense FFN and a sparse MoE FFN combined via residual addition.
Usage
Enables Ollama to run Snowflake Arctic MoE models through the llama.cpp inference engine.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/models/arctic.cpp - Lines: 1-138
Signature
llm_build_arctic::llm_build_arctic(
const llama_model & model,
const llm_graph_params & params) : llm_graph_context(params);
Import
#include "models.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model & | Yes | Loaded model with Arctic MoE weights |
| params | const llm_graph_params & | Yes | Graph construction parameters |
Outputs
| Name | Type | Description |
|---|---|---|
| ggml graph | ggml_cgraph | Complete Arctic MoE computation graph |
Usage Examples
// Dispatched via llama_model::build_graph():
auto builder = llm_build_arctic(model, params);