Implementation:Ollama Ollama Imagegen Engine Generate
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, LLM Inference |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements autoregressive text generation with MLX for the standalone engine binary, supporting text-only and multimodal models.
Description
The generate.go file in cmd/engine provides the core text generation loop including prefill and decode phases. It defines the Model, ChatModel, and MultimodalModel interfaces, a utf8Streamer for buffering partial multi-byte sequences, and a Decoder struct that manages KV caches, token sampling with temperature/top-k/top-p, and memory management via MLX stream switching. The file handles both standard text generation and vision-language model inference with image inputs.
Usage
Used by the standalone engine binary (cmd/engine/main.go) to perform autoregressive token generation from loaded MLX models.
Code Reference
Source Location
- Repository: Ollama
- File: x/imagegen/cmd/engine/generate.go
- Lines: 1-359
Signature
type Model interface {
Tokenizer() *tokenizer.Tokenizer
VocabSize() int32
NewCache(maxSeqLen int32) []cache.Cache
Forward(input *mlx.Array, caches []cache.Cache) *mlx.Array
}
type MultimodalModel interface {
Model
FormatPromptWithImage(prompt string) string
ExpandImageTokens(tokens []int32) []int32
ForwardWithImage(tokens *mlx.Array, image *mlx.Array, caches []cache.Cache) *mlx.Array
ImageSize() int32
}
type Decoder struct { ... }
func NewDecoder(m Model, temp float32, topK int, topP float32) *Decoder
func (d *Decoder) SetImage(img *mlx.Array)
Import
import "github.com/ollama/ollama/x/imagegen/cmd/engine"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| m | Model | Yes | Model implementing the generation interface |
| temp | float32 | Yes | Sampling temperature |
| topK | int | Yes | Top-k sampling parameter |
| topP | float32 | Yes | Top-p (nucleus) sampling parameter |
Outputs
| Name | Type | Description |
|---|---|---|
| *Decoder | *Decoder | Decoder wrapping model and cache for generation |
Usage Examples
decoder := NewDecoder(model, 0.7, 40, 0.9)
decoder.SetImage(imageArray) // optional for multimodal
// Prefill and decode loop managed internally