Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert Qwen3Next

From Leeroopedia
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Qwen3Next hybrid architecture, handling MoE expert tensor merging, linear attention (Gated Delta Net) parameters, and RoPE NeoX reordering.

Description

The qwen3NextModel struct implements ModelConverter for converting Qwen3Next models to GGUF format. This architecture features hybrid attention combining full attention layers with linear (Gated Delta Net) layers at configurable intervals. The converter emits KV metadata for MoE parameters (expert count, shared experts, intermediate sizes), linear attention parameters (conv kernel dim, key/value head dims, number of heads), partial rotary factor, and YaRN RoPE scaling. The Tensors method handles fused gate-up expert tensor splitting, down expert transposition, short convolution weight squeezing, and Q/K weight NeoX reordering via the normalToNeoXRepacker. It also merges per-expert tensors into consolidated expert weight matrices.

Usage

Invoked automatically by the conversion pipeline when the model's architectures field matches Qwen3NextForCausalLM.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_qwen3next.go
  • Lines: 1-512

Signature

type qwen3NextModel struct {
    ModelParameters
    MaxPositionEmbeddings uint32  `json:"max_position_embeddings"`
    HiddenSize            uint32  `json:"hidden_size"`
    NumHiddenLayers       uint32  `json:"num_hidden_layers"`
    // MoE config
    NumExperts             uint32 `json:"num_experts"`
    NumExpertsPerToken     uint32 `json:"num_experts_per_tok"`
    SharedExpertIntermSize uint32 `json:"shared_expert_intermediate_size"`
    // Linear attention config
    FullAttentionInterval uint32 `json:"full_attention_interval"`
    LinearConvKernelDim   uint32 `json:"linear_conv_kernel_dim"`
    LinearKeyHeadDim      uint32 `json:"linear_key_head_dim"`
    PartialRotaryFactor   float32 `json:"partial_rotary_factor"`
    // ...
}

func (m *qwen3NextModel) KV(t *Tokenizer) KV
func (m *qwen3NextModel) Tensors(ts []Tensor) []*ggml.Tensor
func (m *qwen3NextModel) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
ts []Tensor Yes Source model tensors to convert

Outputs

Name Type Description
KV KV GGUF key-value metadata for hybrid attention, MoE, and linear attention params
[]*ggml.Tensor slice Converted tensors with merged experts, split gate-up, and NeoX-reordered Q/K

Usage Examples

// Automatically invoked during model conversion for Qwen3Next
// m := &qwen3NextModel{}
// json.Unmarshal(configData, m)
// kv := m.KV(tokenizer)
// tensors := m.Tensors(sourceTensors)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment