Implementation:Ollama Ollama Convert Glm4MoeLite
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the GLM-4 MoE Lite architecture, handling Multi-head Latent Attention (MLA) with KV-B tensor absorption splitting and MoE expert tensor merging.
Description
The glm4MoeLiteModel struct implements ModelConverter for the GLM-4 MoE Lite model, which uses a DeepSeek-style MLA mechanism. The KV method emits metadata for MLA parameters (QK nope/rope head dims, KV LoRA rank, Q LoRA rank, V head dim, key/value length for MLA), MoE parameters (expert count, shared experts, gating function set to sigmoid), and RoPE configuration. The Tensors method performs two key operations: merging per-expert tensors into consolidated weight matrices, and splitting the combined attn_kv_b tensor into separate attn_k_b and attn_v_b tensors for MLA absorption using the repackKVB method, which handles both KV-first and non-KV-first tensor layouts.
Usage
Invoked automatically when the model's architecture matches Glm4MoeLiteForCausalLM.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_glm4moelite.go
- Lines: 1-264
Signature
type glm4MoeLiteModel struct {
ModelParameters
QKNopeHeadDim uint32 `json:"qk_nope_head_dim"`
QKRopeHeadDim uint32 `json:"qk_rope_head_dim"`
KVLoraRank uint32 `json:"kv_lora_rank"`
VHeadDim uint32 `json:"v_head_dim"`
ExpertCount uint32 `json:"n_routed_experts"`
// ...
}
func (p *glm4MoeLiteModel) KV(t *Tokenizer) KV
func (p *glm4MoeLiteModel) Replacements() []string
func (p *glm4MoeLiteModel) repackKVB(extractK bool, kvFirst bool, numHeads int) Repacker
func (p *glm4MoeLiteModel) Tensors(s []Tensor) (out []*ggml.Tensor)
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| s | []Tensor | Yes | Source tensors including combined KV-B and per-expert weights |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata with glm4moelite.* keys for MLA and MoE parameters |
| []*ggml.Tensor | slice | Converted tensors with split K/V-B and merged experts |
Usage Examples
// Converter registered for GLM-4 MoE Lite architecture
// The attn_kv_b tensor is split into attn_k_b and attn_v_b for MLA absorption:
// K output: [n_head, kv_lora_rank, qk_nope]
// V output: [n_head, v_head, kv_lora_rank]