Implementation:Ollama Ollama Imagegen Cache

Knowledge Sources	Ollama
Domains	Image Generation, Caching
Last Updated	2025-02-15 00:00 GMT

Overview

Provides KV cache implementations for transformer attention, including standard and rotating (sliding window) variants.

Description

The cache.go file defines the Cache interface and two implementations: KVCache for standard growing attention caches and RotatingKVCache for sliding window attention with bounded memory. KVCache grows in steps of 256 by allocating zero buffers and using in-place slice updates for efficient appending. RotatingKVCache wraps around at maxSize, using in-place updates for single tokens and concatenation with trimming for prefill. Both implementations support Reset() for reuse across generation sessions and State() for returning live arrays to the memory manager.

Usage

Used by all LLM models in the imagegen subsystem (Llama, Gemma3, GLM4, GPT-OSS) for KV caching during autoregressive generation.

Code Reference

Source Location

Repository: Ollama
File: x/imagegen/cache/cache.go
Lines: 1-172

Signature

type Cache interface {
	Update(k, v *mlx.Array, seqLen int) (*mlx.Array, *mlx.Array)
	Offset() int
	Len() int
	State() []*mlx.Array
	Reset()
}

type KVCache struct {
	keys, values *mlx.Array
	offset       int
	step         int
}

type RotatingKVCache struct {
	keys, values *mlx.Array
	offset       int
	maxSize      int
	step         int
	idx          int
}

func NewKVCache() *KVCache
func NewRotatingKVCache(maxSize int) *RotatingKVCache

Import

import "github.com/ollama/ollama/x/imagegen/cache"

I/O Contract

Inputs

Name	Type	Required	Description
k	*mlx.Array	Yes	Key tensor [B, H, L, D]
v	*mlx.Array	Yes	Value tensor [B, H, L, D]
seqLen	int	Yes	Number of new tokens being cached

Outputs

Name	Type	Description
keys	*mlx.Array	Full key cache up to current offset [B, H, total_len, D]
values	*mlx.Array	Full value cache up to current offset [B, H, total_len, D]

Usage Examples

// Standard KV cache for autoregressive generation
c := cache.NewKVCache()

// Update returns full cached keys and values
fullK, fullV := c.Update(newK, newV, 1)

// Sliding window cache for long context
rc := cache.NewRotatingKVCache(4096)
fullK, fullV = rc.Update(newK, newV, 1)

Related Pages

Principle:Ollama_Ollama_ImageGeneration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment