Implementation:Ollama Ollama Convert Llama4
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the Meta Llama 4 multimodal MoE architecture, handling fused gate-up expert tensor splitting, dimension transposition, and vision encoder configuration.
Description
The llama4Model struct embeds llamaModel in its TextModel field and reuses its KV generation (remapping llama.* keys to llama4.*). It adds MoE parameters (expert count, interleave step), QK normalization, chunked attention, and vision encoder config (including pixel shuffle ratio). The Tensors method splits fused ffn_gate_up_exps tensors into separate gate and up expert tensors by slicing along the hidden dimension with a transpose, transposes ffn_down_exps to swap dimensions, and passes vision/multimodal projector tensors through unchanged. Non-expert text tensors are delegated to the embedded llamaModel.Tensors with repacking disabled.
Usage
Invoked automatically when the model's architecture matches Llama4ForConditionalGeneration.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_llama4.go
- Lines: 1-169
Signature
type llama4Model struct {
ModelParameters
TextModel struct {
llamaModel
NumExpertsPerToken uint32 `json:"num_experts_per_tok"`
NumLocalExperts uint32 `json:"num_local_experts"`
InterleaveMOELayerStep uint32 `json:"interleave_moe_layer_step"`
UseQKNorm bool `json:"use_qk_norm"`
AttentionChunkSize uint32 `json:"attention_chunk_size"`
} `json:"text_config"`
VisionModel struct { ... } `json:"vision_config"`
}
func (p *llama4Model) KV(t *Tokenizer) KV
func (p *llama4Model) Replacements() []string
func (p *llama4Model) Tensors(ts []Tensor) []*ggml.Tensor
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| ts | []Tensor | Yes | Source tensors including fused gate-up experts and vision tensors |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata with llama4.* keys for MoE, chunked attention, and vision |
| []*ggml.Tensor | slice | Converted tensors with split gate/up experts and transposed down experts |
Usage Examples
// Converter registered for Llama 4 architecture
// ffn_gate_up_exps [E, H, I*2] -> ffn_gate_exps [E, I, H] + ffn_up_exps [E, I, H]
// ffn_down_exps [E, I, H] -> [E, H, I]