Implementation:Ollama Ollama Convert Llama4

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Meta Llama 4 multimodal MoE architecture, handling fused gate-up expert tensor splitting, dimension transposition, and vision encoder configuration.

Description

The llama4Model struct embeds llamaModel in its TextModel field and reuses its KV generation (remapping llama.* keys to llama4.*). It adds MoE parameters (expert count, interleave step), QK normalization, chunked attention, and vision encoder config (including pixel shuffle ratio). The Tensors method splits fused ffn_gate_up_exps tensors into separate gate and up expert tensors by slicing along the hidden dimension with a transpose, transposes ffn_down_exps to swap dimensions, and passes vision/multimodal projector tensors through unchanged. Non-expert text tensors are delegated to the embedded llamaModel.Tensors with repacking disabled.

Usage

Invoked automatically when the model's architecture matches Llama4ForConditionalGeneration.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_llama4.go
Lines: 1-169

Signature

type llama4Model struct {
    ModelParameters
    TextModel struct {
        llamaModel
        NumExpertsPerToken     uint32 `json:"num_experts_per_tok"`
        NumLocalExperts        uint32 `json:"num_local_experts"`
        InterleaveMOELayerStep uint32 `json:"interleave_moe_layer_step"`
        UseQKNorm              bool   `json:"use_qk_norm"`
        AttentionChunkSize     uint32 `json:"attention_chunk_size"`
    } `json:"text_config"`
    VisionModel struct { ... } `json:"vision_config"`
}

func (p *llama4Model) KV(t *Tokenizer) KV
func (p *llama4Model) Replacements() []string
func (p *llama4Model) Tensors(ts []Tensor) []*ggml.Tensor

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
ts	[]Tensor	Yes	Source tensors including fused gate-up experts and vision tensors

Outputs

Name	Type	Description
KV	KV	GGUF metadata with llama4.* keys for MoE, chunked attention, and vision
[]*ggml.Tensor	slice	Converted tensors with split gate/up experts and transposed down experts

Usage Examples

// Converter registered for Llama 4 architecture
// ffn_gate_up_exps [E, H, I*2] -> ffn_gate_exps [E, I, H] + ffn_up_exps [E, I, H]
// ffn_down_exps [E, I, H] -> [E, H, I]

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_Llama4

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment