Implementation:Ollama Ollama Convert Qwen3

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Qwen3 architecture, supporting both dense and MoE variants with dynamic architecture naming, fused expert tensor splitting, and multiple RoPE scaling modes.

Description

The qwen3Model struct implements ModelConverter with dynamic architecture naming: qwen3 for dense models and qwen3moe for MoE models (based on NumExperts > 0). KV metadata uses unprefixed keys that get auto-prefixed by the accessor. It supports QK normalization, head dim configuration, and MoE parameters (expert count, experts per token, norm top-k probability). The Tensors method handles fused gate_up_exps expert tensors by splitting them into separate gate and up tensors using splitDim with transposition, and transposes down_exps dimensions. Supports yarn and mrope RoPE scaling modes. Serves as the base struct for qwen3VLModel.

Usage

Invoked automatically when the model's architecture matches Qwen3ForCausalLM or Qwen3MoeForCausalLM.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_qwen3.go
Lines: 1-157

Signature

type qwen3Model struct {
    ModelParameters
    MaxPositionEmbeddings uint32  `json:"max_position_embeddings"`
    HiddenSize            uint32  `json:"hidden_size"`
    HiddenLayers          uint32  `json:"num_hidden_layers"`
    HeadDim               uint32  `json:"head_dim"`
    NumExperts            uint32  `json:"num_experts"`
    NumExpertsPerToken    uint32  `json:"num_experts_per_tok"`
    RopeTheta             float32 `json:"rope_theta"`
    RMSNormEPS            float32 `json:"rms_norm_eps"`
}

func (q *qwen3Model) KV(t *Tokenizer) KV
func (q *qwen3Model) Tensors(ts []Tensor) []*ggml.Tensor
func (q *qwen3Model) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
ts	[]Tensor	Yes	Source tensors including fused gate-up expert tensors

Outputs

Name	Type	Description
KV	KV	GGUF metadata with qwen3/qwen3moe architecture and unprefixed keys
[]*ggml.Tensor	slice	Converted tensors with split gate/up experts and transposed down experts

Usage Examples

// Converter registered for Qwen3 (dense and MoE)
// Architecture is "qwen3" for dense, "qwen3moe" for MoE
// ffn_gate_up_exps is split into ffn_gate_exps + ffn_up_exps with transpose
// ffn_down_exps dimensions are transposed [E, I, H] -> [E, H, I]

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_Qwen3

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment