Implementation:Ollama Ollama Imagegen Engine Generate

Knowledge Sources	Ollama
Domains	Image Generation, LLM Inference
Last Updated	2025-02-15 00:00 GMT

Overview

Implements autoregressive text generation with MLX for the standalone engine binary, supporting text-only and multimodal models.

Description

The generate.go file in cmd/engine provides the core text generation loop including prefill and decode phases. It defines the Model, ChatModel, and MultimodalModel interfaces, a utf8Streamer for buffering partial multi-byte sequences, and a Decoder struct that manages KV caches, token sampling with temperature/top-k/top-p, and memory management via MLX stream switching. The file handles both standard text generation and vision-language model inference with image inputs.

Usage

Used by the standalone engine binary (cmd/engine/main.go) to perform autoregressive token generation from loaded MLX models.

Code Reference

Source Location

Repository: Ollama
File: x/imagegen/cmd/engine/generate.go
Lines: 1-359

Signature

type Model interface {
	Tokenizer() *tokenizer.Tokenizer
	VocabSize() int32
	NewCache(maxSeqLen int32) []cache.Cache
	Forward(input *mlx.Array, caches []cache.Cache) *mlx.Array
}

type MultimodalModel interface {
	Model
	FormatPromptWithImage(prompt string) string
	ExpandImageTokens(tokens []int32) []int32
	ForwardWithImage(tokens *mlx.Array, image *mlx.Array, caches []cache.Cache) *mlx.Array
	ImageSize() int32
}

type Decoder struct { ... }

func NewDecoder(m Model, temp float32, topK int, topP float32) *Decoder
func (d *Decoder) SetImage(img *mlx.Array)

Import

import "github.com/ollama/ollama/x/imagegen/cmd/engine"

I/O Contract

Inputs

Name	Type	Required	Description
m	Model	Yes	Model implementing the generation interface
temp	float32	Yes	Sampling temperature
topK	int	Yes	Top-k sampling parameter
topP	float32	Yes	Top-p (nucleus) sampling parameter

Outputs

Name	Type	Description
*Decoder	*Decoder	Decoder wrapping model and cache for generation

Usage Examples

decoder := NewDecoder(model, 0.7, 40, 0.9)
decoder.SetImage(imageArray) // optional for multimodal

// Prefill and decode loop managed internally

Related Pages

Principle:Ollama_Ollama_ImageGeneration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment