Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert Llama4

From Leeroopedia
Revision as of 13:24, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ollama_Ollama_Convert_Llama4.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Meta Llama 4 multimodal MoE architecture, handling fused gate-up expert tensor splitting, dimension transposition, and vision encoder configuration.

Description

The llama4Model struct embeds llamaModel in its TextModel field and reuses its KV generation (remapping llama.* keys to llama4.*). It adds MoE parameters (expert count, interleave step), QK normalization, chunked attention, and vision encoder config (including pixel shuffle ratio). The Tensors method splits fused ffn_gate_up_exps tensors into separate gate and up expert tensors by slicing along the hidden dimension with a transpose, transposes ffn_down_exps to swap dimensions, and passes vision/multimodal projector tensors through unchanged. Non-expert text tensors are delegated to the embedded llamaModel.Tensors with repacking disabled.

Usage

Invoked automatically when the model's architecture matches Llama4ForConditionalGeneration.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_llama4.go
  • Lines: 1-169

Signature

type llama4Model struct {
    ModelParameters
    TextModel struct {
        llamaModel
        NumExpertsPerToken     uint32 `json:"num_experts_per_tok"`
        NumLocalExperts        uint32 `json:"num_local_experts"`
        InterleaveMOELayerStep uint32 `json:"interleave_moe_layer_step"`
        UseQKNorm              bool   `json:"use_qk_norm"`
        AttentionChunkSize     uint32 `json:"attention_chunk_size"`
    } `json:"text_config"`
    VisionModel struct { ... } `json:"vision_config"`
}

func (p *llama4Model) KV(t *Tokenizer) KV
func (p *llama4Model) Replacements() []string
func (p *llama4Model) Tensors(ts []Tensor) []*ggml.Tensor

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
ts []Tensor Yes Source tensors including fused gate-up experts and vision tensors

Outputs

Name Type Description
KV KV GGUF metadata with llama4.* keys for MoE, chunked attention, and vision
[]*ggml.Tensor slice Converted tensors with split gate/up experts and transposed down experts

Usage Examples

// Converter registered for Llama 4 architecture
// ffn_gate_up_exps [E, H, I*2] -> ffn_gate_exps [E, I, H] + ffn_up_exps [E, I, H]
// ffn_down_exps [E, I, H] -> [E, H, I]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment