Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert Qwen3

From Leeroopedia
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Qwen3 architecture, supporting both dense and MoE variants with dynamic architecture naming, fused expert tensor splitting, and multiple RoPE scaling modes.

Description

The qwen3Model struct implements ModelConverter with dynamic architecture naming: qwen3 for dense models and qwen3moe for MoE models (based on NumExperts > 0). KV metadata uses unprefixed keys that get auto-prefixed by the accessor. It supports QK normalization, head dim configuration, and MoE parameters (expert count, experts per token, norm top-k probability). The Tensors method handles fused gate_up_exps expert tensors by splitting them into separate gate and up tensors using splitDim with transposition, and transposes down_exps dimensions. Supports yarn and mrope RoPE scaling modes. Serves as the base struct for qwen3VLModel.

Usage

Invoked automatically when the model's architecture matches Qwen3ForCausalLM or Qwen3MoeForCausalLM.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_qwen3.go
  • Lines: 1-157

Signature

type qwen3Model struct {
    ModelParameters
    MaxPositionEmbeddings uint32  `json:"max_position_embeddings"`
    HiddenSize            uint32  `json:"hidden_size"`
    HiddenLayers          uint32  `json:"num_hidden_layers"`
    HeadDim               uint32  `json:"head_dim"`
    NumExperts            uint32  `json:"num_experts"`
    NumExpertsPerToken    uint32  `json:"num_experts_per_tok"`
    RopeTheta             float32 `json:"rope_theta"`
    RMSNormEPS            float32 `json:"rms_norm_eps"`
}

func (q *qwen3Model) KV(t *Tokenizer) KV
func (q *qwen3Model) Tensors(ts []Tensor) []*ggml.Tensor
func (q *qwen3Model) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
ts []Tensor Yes Source tensors including fused gate-up expert tensors

Outputs

Name Type Description
KV KV GGUF metadata with qwen3/qwen3moe architecture and unprefixed keys
[]*ggml.Tensor slice Converted tensors with split gate/up experts and transposed down experts

Usage Examples

// Converter registered for Qwen3 (dense and MoE)
// Architecture is "qwen3" for dense, "qwen3moe" for MoE
// ffn_gate_up_exps is split into ffn_gate_exps + ffn_up_exps with transpose
// ffn_down_exps dimensions are transposed [E, I, H] -> [E, H, I]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment