Implementation:Ollama Ollama Imagegen Llama

Knowledge Sources	Ollama
Domains	Image Generation, LLM Inference
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the Llama model architecture for MLX inference with GQA, RoPE, and SiLU-gated MLP.

Description

The llama.go file provides a clean Llama model implementation for the imagegen MLX engine. The Model struct contains token embeddings, decoder layers with Attention (GQA with separate Q/K/V projections and AsStrided for efficient head reshaping), SiLU-gated MLP (gate_proj, up_proj, down_proj), and RMSNorm layers. RoPE is applied via mlx.RoPE with configurable theta and head dimension. The Forward pass processes through all layers with KV cache support, ending with RMSNorm and the output linear projection. Weight loading uses struct tags with safetensors.LoadModule, and tied embeddings (lm_head = embed_tokens) are set up if lm_head weights are absent.

Usage

Used for text generation with Llama-family models in the standalone MLX engine.

Code Reference

Source Location

Repository: Ollama
File: x/imagegen/models/llama/llama.go
Lines: 1-152

Signature

type Config struct {
	HiddenSize            int32   `json:"hidden_size"`
	NumHiddenLayers       int32   `json:"num_hidden_layers"`
	IntermediateSize      int32   `json:"intermediate_size"`
	NumAttentionHeads     int32   `json:"num_attention_heads"`
	NumKeyValueHeads      int32   `json:"num_key_value_heads"`
	VocabSize             int32   `json:"vocab_size"`
	RMSNormEps            float32 `json:"rms_norm_eps"`
	RopeTheta             float32 `json:"rope_theta"`
}

type Model struct {
	EmbedTokens *nn.Embedding `weight:"model.embed_tokens"`
	Layers      []*Layer      `weight:"model.layers"`
	Norm        *nn.RMSNorm   `weight:"model.norm"`
	Output      *nn.Linear    `weight:"lm_head,optional"`
}

func Load(modelPath string) (*Model, error)
func (m *Model) Forward(tokens *mlx.Array, caches []cache.Cache) *mlx.Array

Import

import "github.com/ollama/ollama/x/imagegen/models/llama"

I/O Contract

Inputs

Name	Type	Required	Description
modelPath	string	Yes	Directory containing model weights and config
tokens	*mlx.Array	Yes	Input token IDs [B, L]
caches	[]cache.Cache	Yes	KV caches for each layer

Outputs

Name	Type	Description
*mlx.Array	*mlx.Array	Logits [B, L, vocab_size]

Usage Examples

model, err := llama.Load("/path/to/llama-model")
if err != nil {
    return err
}

caches := model.NewCache(0)
logits := model.Forward(tokens, caches)
nextToken := sample(logits)

Related Pages

Principle:Ollama_Ollama_ImageGeneration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment