Implementation:Ollama Ollama Convert Qwen3Next

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Qwen3Next hybrid architecture, handling MoE expert tensor merging, linear attention (Gated Delta Net) parameters, and RoPE NeoX reordering.

Description

The qwen3NextModel struct implements ModelConverter for converting Qwen3Next models to GGUF format. This architecture features hybrid attention combining full attention layers with linear (Gated Delta Net) layers at configurable intervals. The converter emits KV metadata for MoE parameters (expert count, shared experts, intermediate sizes), linear attention parameters (conv kernel dim, key/value head dims, number of heads), partial rotary factor, and YaRN RoPE scaling. The Tensors method handles fused gate-up expert tensor splitting, down expert transposition, short convolution weight squeezing, and Q/K weight NeoX reordering via the normalToNeoXRepacker. It also merges per-expert tensors into consolidated expert weight matrices.

Usage

Invoked automatically by the conversion pipeline when the model's architectures field matches Qwen3NextForCausalLM.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_qwen3next.go
Lines: 1-512

Signature

type qwen3NextModel struct {
    ModelParameters
    MaxPositionEmbeddings uint32  `json:"max_position_embeddings"`
    HiddenSize            uint32  `json:"hidden_size"`
    NumHiddenLayers       uint32  `json:"num_hidden_layers"`
    // MoE config
    NumExperts             uint32 `json:"num_experts"`
    NumExpertsPerToken     uint32 `json:"num_experts_per_tok"`
    SharedExpertIntermSize uint32 `json:"shared_expert_intermediate_size"`
    // Linear attention config
    FullAttentionInterval uint32 `json:"full_attention_interval"`
    LinearConvKernelDim   uint32 `json:"linear_conv_kernel_dim"`
    LinearKeyHeadDim      uint32 `json:"linear_key_head_dim"`
    PartialRotaryFactor   float32 `json:"partial_rotary_factor"`
    // ...
}

func (m *qwen3NextModel) KV(t *Tokenizer) KV
func (m *qwen3NextModel) Tensors(ts []Tensor) []*ggml.Tensor
func (m *qwen3NextModel) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
ts	[]Tensor	Yes	Source model tensors to convert

Outputs

Name	Type	Description
KV	KV	GGUF key-value metadata for hybrid attention, MoE, and linear attention params
[]*ggml.Tensor	slice	Converted tensors with merged experts, split gate-up, and NeoX-reordered Q/K

Usage Examples

// Automatically invoked during model conversion for Qwen3Next
// m := &qwen3NextModel{}
// json.Unmarshal(configData, m)
// kv := m.KV(tokenizer)
// tensors := m.Tensors(sourceTensors)

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_Qwen3Next

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment