Implementation:Ollama Ollama Convert Glm4MoeLite

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the GLM-4 MoE Lite architecture, handling Multi-head Latent Attention (MLA) with KV-B tensor absorption splitting and MoE expert tensor merging.

Description

The glm4MoeLiteModel struct implements ModelConverter for the GLM-4 MoE Lite model, which uses a DeepSeek-style MLA mechanism. The KV method emits metadata for MLA parameters (QK nope/rope head dims, KV LoRA rank, Q LoRA rank, V head dim, key/value length for MLA), MoE parameters (expert count, shared experts, gating function set to sigmoid), and RoPE configuration. The Tensors method performs two key operations: merging per-expert tensors into consolidated weight matrices, and splitting the combined attn_kv_b tensor into separate attn_k_b and attn_v_b tensors for MLA absorption using the repackKVB method, which handles both KV-first and non-KV-first tensor layouts.

Usage

Invoked automatically when the model's architecture matches Glm4MoeLiteForCausalLM.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_glm4moelite.go
Lines: 1-264

Signature

type glm4MoeLiteModel struct {
    ModelParameters
    QKNopeHeadDim uint32 `json:"qk_nope_head_dim"`
    QKRopeHeadDim uint32 `json:"qk_rope_head_dim"`
    KVLoraRank    uint32 `json:"kv_lora_rank"`
    VHeadDim      uint32 `json:"v_head_dim"`
    ExpertCount   uint32 `json:"n_routed_experts"`
    // ...
}

func (p *glm4MoeLiteModel) KV(t *Tokenizer) KV
func (p *glm4MoeLiteModel) Replacements() []string
func (p *glm4MoeLiteModel) repackKVB(extractK bool, kvFirst bool, numHeads int) Repacker
func (p *glm4MoeLiteModel) Tensors(s []Tensor) (out []*ggml.Tensor)

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
s	[]Tensor	Yes	Source tensors including combined KV-B and per-expert weights

Outputs

Name	Type	Description
KV	KV	GGUF metadata with glm4moelite.* keys for MLA and MoE parameters
[]*ggml.Tensor	slice	Converted tensors with split K/V-B and merged experts

Usage Examples

// Converter registered for GLM-4 MoE Lite architecture
// The attn_kv_b tensor is split into attn_k_b and attn_v_b for MLA absorption:
// K output: [n_head, kv_lora_rank, qk_nope]
// V output: [n_head, v_head, kv_lora_rank]

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_Glm4MoeLite

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment