Implementation:Ollama Ollama Imagegen Llama
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, LLM Inference |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the Llama model architecture for MLX inference with GQA, RoPE, and SiLU-gated MLP.
Description
The llama.go file provides a clean Llama model implementation for the imagegen MLX engine. The Model struct contains token embeddings, decoder layers with Attention (GQA with separate Q/K/V projections and AsStrided for efficient head reshaping), SiLU-gated MLP (gate_proj, up_proj, down_proj), and RMSNorm layers. RoPE is applied via mlx.RoPE with configurable theta and head dimension. The Forward pass processes through all layers with KV cache support, ending with RMSNorm and the output linear projection. Weight loading uses struct tags with safetensors.LoadModule, and tied embeddings (lm_head = embed_tokens) are set up if lm_head weights are absent.
Usage
Used for text generation with Llama-family models in the standalone MLX engine.
Code Reference
Source Location
- Repository: Ollama
- File: x/imagegen/models/llama/llama.go
- Lines: 1-152
Signature
type Config struct {
HiddenSize int32 `json:"hidden_size"`
NumHiddenLayers int32 `json:"num_hidden_layers"`
IntermediateSize int32 `json:"intermediate_size"`
NumAttentionHeads int32 `json:"num_attention_heads"`
NumKeyValueHeads int32 `json:"num_key_value_heads"`
VocabSize int32 `json:"vocab_size"`
RMSNormEps float32 `json:"rms_norm_eps"`
RopeTheta float32 `json:"rope_theta"`
}
type Model struct {
EmbedTokens *nn.Embedding `weight:"model.embed_tokens"`
Layers []*Layer `weight:"model.layers"`
Norm *nn.RMSNorm `weight:"model.norm"`
Output *nn.Linear `weight:"lm_head,optional"`
}
func Load(modelPath string) (*Model, error)
func (m *Model) Forward(tokens *mlx.Array, caches []cache.Cache) *mlx.Array
Import
import "github.com/ollama/ollama/x/imagegen/models/llama"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| modelPath | string | Yes | Directory containing model weights and config |
| tokens | *mlx.Array | Yes | Input token IDs [B, L] |
| caches | []cache.Cache | Yes | KV caches for each layer |
Outputs
| Name | Type | Description |
|---|---|---|
| *mlx.Array | *mlx.Array | Logits [B, L, vocab_size] |
Usage Examples
model, err := llama.Load("/path/to/llama-model")
if err != nil {
return err
}
caches := model.NewCache(0)
logits := model.Forward(tokens, caches)
nextToken := sample(logits)