Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert Glm4MoeLite

From Leeroopedia
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the GLM-4 MoE Lite architecture, handling Multi-head Latent Attention (MLA) with KV-B tensor absorption splitting and MoE expert tensor merging.

Description

The glm4MoeLiteModel struct implements ModelConverter for the GLM-4 MoE Lite model, which uses a DeepSeek-style MLA mechanism. The KV method emits metadata for MLA parameters (QK nope/rope head dims, KV LoRA rank, Q LoRA rank, V head dim, key/value length for MLA), MoE parameters (expert count, shared experts, gating function set to sigmoid), and RoPE configuration. The Tensors method performs two key operations: merging per-expert tensors into consolidated weight matrices, and splitting the combined attn_kv_b tensor into separate attn_k_b and attn_v_b tensors for MLA absorption using the repackKVB method, which handles both KV-first and non-KV-first tensor layouts.

Usage

Invoked automatically when the model's architecture matches Glm4MoeLiteForCausalLM.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_glm4moelite.go
  • Lines: 1-264

Signature

type glm4MoeLiteModel struct {
    ModelParameters
    QKNopeHeadDim uint32 `json:"qk_nope_head_dim"`
    QKRopeHeadDim uint32 `json:"qk_rope_head_dim"`
    KVLoraRank    uint32 `json:"kv_lora_rank"`
    VHeadDim      uint32 `json:"v_head_dim"`
    ExpertCount   uint32 `json:"n_routed_experts"`
    // ...
}

func (p *glm4MoeLiteModel) KV(t *Tokenizer) KV
func (p *glm4MoeLiteModel) Replacements() []string
func (p *glm4MoeLiteModel) repackKVB(extractK bool, kvFirst bool, numHeads int) Repacker
func (p *glm4MoeLiteModel) Tensors(s []Tensor) (out []*ggml.Tensor)

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
s []Tensor Yes Source tensors including combined KV-B and per-expert weights

Outputs

Name Type Description
KV KV GGUF metadata with glm4moelite.* keys for MLA and MoE parameters
[]*ggml.Tensor slice Converted tensors with split K/V-B and merged experts

Usage Examples

// Converter registered for GLM-4 MoE Lite architecture
// The attn_kv_b tensor is split into attn_k_b and attn_v_b for MLA absorption:
// K output: [n_head, kv_lora_rank, qk_nope]
// V output: [n_head, v_head, kv_lora_rank]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment