Implementation:Ollama Ollama Imagegen Cache
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, Caching |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Provides KV cache implementations for transformer attention, including standard and rotating (sliding window) variants.
Description
The cache.go file defines the Cache interface and two implementations: KVCache for standard growing attention caches and RotatingKVCache for sliding window attention with bounded memory. KVCache grows in steps of 256 by allocating zero buffers and using in-place slice updates for efficient appending. RotatingKVCache wraps around at maxSize, using in-place updates for single tokens and concatenation with trimming for prefill. Both implementations support Reset() for reuse across generation sessions and State() for returning live arrays to the memory manager.
Usage
Used by all LLM models in the imagegen subsystem (Llama, Gemma3, GLM4, GPT-OSS) for KV caching during autoregressive generation.
Code Reference
Source Location
- Repository: Ollama
- File: x/imagegen/cache/cache.go
- Lines: 1-172
Signature
type Cache interface {
Update(k, v *mlx.Array, seqLen int) (*mlx.Array, *mlx.Array)
Offset() int
Len() int
State() []*mlx.Array
Reset()
}
type KVCache struct {
keys, values *mlx.Array
offset int
step int
}
type RotatingKVCache struct {
keys, values *mlx.Array
offset int
maxSize int
step int
idx int
}
func NewKVCache() *KVCache
func NewRotatingKVCache(maxSize int) *RotatingKVCache
Import
import "github.com/ollama/ollama/x/imagegen/cache"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| k | *mlx.Array | Yes | Key tensor [B, H, L, D] |
| v | *mlx.Array | Yes | Value tensor [B, H, L, D] |
| seqLen | int | Yes | Number of new tokens being cached |
Outputs
| Name | Type | Description |
|---|---|---|
| keys | *mlx.Array | Full key cache up to current offset [B, H, total_len, D] |
| values | *mlx.Array | Full value cache up to current offset [B, H, total_len, D] |
Usage Examples
// Standard KV cache for autoregressive generation
c := cache.NewKVCache()
// Update returns full cached keys and values
fullK, fullV := c.Update(newK, newV, 1)
// Sliding window cache for long context
rc := cache.NewRotatingKVCache(4096)
fullK, fullV = rc.Update(newK, newV, 1)