Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Imagegen Cache

From Leeroopedia
Knowledge Sources
Domains Image Generation, Caching
Last Updated 2025-02-15 00:00 GMT

Overview

Provides KV cache implementations for transformer attention, including standard and rotating (sliding window) variants.

Description

The cache.go file defines the Cache interface and two implementations: KVCache for standard growing attention caches and RotatingKVCache for sliding window attention with bounded memory. KVCache grows in steps of 256 by allocating zero buffers and using in-place slice updates for efficient appending. RotatingKVCache wraps around at maxSize, using in-place updates for single tokens and concatenation with trimming for prefill. Both implementations support Reset() for reuse across generation sessions and State() for returning live arrays to the memory manager.

Usage

Used by all LLM models in the imagegen subsystem (Llama, Gemma3, GLM4, GPT-OSS) for KV caching during autoregressive generation.

Code Reference

Source Location

  • Repository: Ollama
  • File: x/imagegen/cache/cache.go
  • Lines: 1-172

Signature

type Cache interface {
	Update(k, v *mlx.Array, seqLen int) (*mlx.Array, *mlx.Array)
	Offset() int
	Len() int
	State() []*mlx.Array
	Reset()
}

type KVCache struct {
	keys, values *mlx.Array
	offset       int
	step         int
}

type RotatingKVCache struct {
	keys, values *mlx.Array
	offset       int
	maxSize      int
	step         int
	idx          int
}

func NewKVCache() *KVCache
func NewRotatingKVCache(maxSize int) *RotatingKVCache

Import

import "github.com/ollama/ollama/x/imagegen/cache"

I/O Contract

Inputs

Name Type Required Description
k *mlx.Array Yes Key tensor [B, H, L, D]
v *mlx.Array Yes Value tensor [B, H, L, D]
seqLen int Yes Number of new tokens being cached

Outputs

Name Type Description
keys *mlx.Array Full key cache up to current offset [B, H, total_len, D]
values *mlx.Array Full value cache up to current offset [B, H, total_len, D]

Usage Examples

// Standard KV cache for autoregressive generation
c := cache.NewKVCache()

// Update returns full cached keys and values
fullK, fullV := c.Update(newK, newV, 1)

// Sliding window cache for long context
rc := cache.NewRotatingKVCache(4096)
fullK, fullV = rc.Update(newK, newV, 1)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment