Implementation:Ollama Ollama Convert Qwen3Next
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the Qwen3Next hybrid architecture, handling MoE expert tensor merging, linear attention (Gated Delta Net) parameters, and RoPE NeoX reordering.
Description
The qwen3NextModel struct implements ModelConverter for converting Qwen3Next models to GGUF format. This architecture features hybrid attention combining full attention layers with linear (Gated Delta Net) layers at configurable intervals. The converter emits KV metadata for MoE parameters (expert count, shared experts, intermediate sizes), linear attention parameters (conv kernel dim, key/value head dims, number of heads), partial rotary factor, and YaRN RoPE scaling. The Tensors method handles fused gate-up expert tensor splitting, down expert transposition, short convolution weight squeezing, and Q/K weight NeoX reordering via the normalToNeoXRepacker. It also merges per-expert tensors into consolidated expert weight matrices.
Usage
Invoked automatically by the conversion pipeline when the model's architectures field matches Qwen3NextForCausalLM.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_qwen3next.go
- Lines: 1-512
Signature
type qwen3NextModel struct {
ModelParameters
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
HiddenSize uint32 `json:"hidden_size"`
NumHiddenLayers uint32 `json:"num_hidden_layers"`
// MoE config
NumExperts uint32 `json:"num_experts"`
NumExpertsPerToken uint32 `json:"num_experts_per_tok"`
SharedExpertIntermSize uint32 `json:"shared_expert_intermediate_size"`
// Linear attention config
FullAttentionInterval uint32 `json:"full_attention_interval"`
LinearConvKernelDim uint32 `json:"linear_conv_kernel_dim"`
LinearKeyHeadDim uint32 `json:"linear_key_head_dim"`
PartialRotaryFactor float32 `json:"partial_rotary_factor"`
// ...
}
func (m *qwen3NextModel) KV(t *Tokenizer) KV
func (m *qwen3NextModel) Tensors(ts []Tensor) []*ggml.Tensor
func (m *qwen3NextModel) Replacements() []string
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| ts | []Tensor | Yes | Source model tensors to convert |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF key-value metadata for hybrid attention, MoE, and linear attention params |
| []*ggml.Tensor | slice | Converted tensors with merged experts, split gate-up, and NeoX-reordered Q/K |
Usage Examples
// Automatically invoked during model conversion for Qwen3Next
// m := &qwen3NextModel{}
// json.Unmarshal(configData, m)
// kv := m.KV(tokenizer)
// tensors := m.Tensors(sourceTensors)