Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama KVCache Wrapper

From Leeroopedia
Knowledge Sources
Domains Inference Runtime, KV Cache Management
Last Updated 2025-02-15 00:00 GMT

Overview

Implements a composite KV cache container that wraps multiple cache implementations, dispatching operations to the appropriate cache based on the active cache type set by the model.

Description

The WrapperCache struct holds a slice of Cache implementations and a curType index that determines which cache receives SetLayer, Get, and Put calls. Lifecycle operations (Init, SetConfig, Close, StartForward, Remove) are forwarded to all wrapped caches. The SetCacheType method allows models to switch between caches (e.g., encoder vs. decoder) during a forward pass. On StartForward errors, it performs an unwind by removing entries from already-started caches.

Usage

Used by multimodal and encoder-decoder models that need both an encoder cache and a causal decoder cache. The model sets the cache type before each layer to route operations to the correct underlying cache.

Code Reference

Source Location

  • Repository: Ollama
  • File: kvcache/wrapper.go
  • Lines: 1-110

Signature

type WrapperCache struct {
    caches  []Cache
    curType int
}

func NewWrapperCache(caches ...Cache) *WrapperCache
func (c *WrapperCache) SetCacheType(cacheType int)
func (c *WrapperCache) Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
func (c *WrapperCache) StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
func (c *WrapperCache) SetLayer(layer int)
func (c *WrapperCache) Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
func (c *WrapperCache) Put(ctx ml.Context, key, value ml.Tensor)
func (c *WrapperCache) Close()

Import

import "github.com/ollama/ollama/kvcache"

I/O Contract

Inputs

Name Type Required Description
caches ...Cache Yes One or more Cache implementations to wrap (e.g., EncoderCache + Causal)
cacheType int Yes Index of the cache to activate for subsequent SetLayer/Get/Put calls

Outputs

Name Type Description
key ml.Tensor Key tensor from the currently active cache
value ml.Tensor Value tensor from the currently active cache
mask ml.Tensor Mask from the currently active cache

Usage Examples

// Create a wrapper for encoder-decoder model
encCache := kvcache.NewEncoderCache()
decCache := kvcache.NewCausal()
wrapper := kvcache.NewWrapperCache(encCache, decCache)
wrapper.Init(backend, ml.DTypeF16, 4, 2048, 512)

// During forward pass - use encoder cache (index 0)
wrapper.SetCacheType(0)
wrapper.SetLayer(0)
wrapper.Put(ctx, encoderKey, encoderValue)

// Switch to decoder cache (index 1)
wrapper.SetCacheType(1)
wrapper.SetLayer(0)
key, value, mask := wrapper.Get(ctx)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment