Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Imagegen Llama

From Leeroopedia
Knowledge Sources
Domains Image Generation, LLM Inference
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the Llama model architecture for MLX inference with GQA, RoPE, and SiLU-gated MLP.

Description

The llama.go file provides a clean Llama model implementation for the imagegen MLX engine. The Model struct contains token embeddings, decoder layers with Attention (GQA with separate Q/K/V projections and AsStrided for efficient head reshaping), SiLU-gated MLP (gate_proj, up_proj, down_proj), and RMSNorm layers. RoPE is applied via mlx.RoPE with configurable theta and head dimension. The Forward pass processes through all layers with KV cache support, ending with RMSNorm and the output linear projection. Weight loading uses struct tags with safetensors.LoadModule, and tied embeddings (lm_head = embed_tokens) are set up if lm_head weights are absent.

Usage

Used for text generation with Llama-family models in the standalone MLX engine.

Code Reference

Source Location

  • Repository: Ollama
  • File: x/imagegen/models/llama/llama.go
  • Lines: 1-152

Signature

type Config struct {
	HiddenSize            int32   `json:"hidden_size"`
	NumHiddenLayers       int32   `json:"num_hidden_layers"`
	IntermediateSize      int32   `json:"intermediate_size"`
	NumAttentionHeads     int32   `json:"num_attention_heads"`
	NumKeyValueHeads      int32   `json:"num_key_value_heads"`
	VocabSize             int32   `json:"vocab_size"`
	RMSNormEps            float32 `json:"rms_norm_eps"`
	RopeTheta             float32 `json:"rope_theta"`
}

type Model struct {
	EmbedTokens *nn.Embedding `weight:"model.embed_tokens"`
	Layers      []*Layer      `weight:"model.layers"`
	Norm        *nn.RMSNorm   `weight:"model.norm"`
	Output      *nn.Linear    `weight:"lm_head,optional"`
}

func Load(modelPath string) (*Model, error)
func (m *Model) Forward(tokens *mlx.Array, caches []cache.Cache) *mlx.Array

Import

import "github.com/ollama/ollama/x/imagegen/models/llama"

I/O Contract

Inputs

Name Type Required Description
modelPath string Yes Directory containing model weights and config
tokens *mlx.Array Yes Input token IDs [B, L]
caches []cache.Cache Yes KV caches for each layer

Outputs

Name Type Description
*mlx.Array *mlx.Array Logits [B, L, vocab_size]

Usage Examples

model, err := llama.Load("/path/to/llama-model")
if err != nil {
    return err
}

caches := model.NewCache(0)
logits := model.Forward(tokens, caches)
nextToken := sample(logits)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment