Implementation:Ollama Ollama KVCache Wrapper

Knowledge Sources	Ollama
Domains	Inference Runtime, KV Cache Management
Last Updated	2025-02-15 00:00 GMT

Overview

Implements a composite KV cache container that wraps multiple cache implementations, dispatching operations to the appropriate cache based on the active cache type set by the model.

Description

The WrapperCache struct holds a slice of Cache implementations and a curType index that determines which cache receives SetLayer, Get, and Put calls. Lifecycle operations (Init, SetConfig, Close, StartForward, Remove) are forwarded to all wrapped caches. The SetCacheType method allows models to switch between caches (e.g., encoder vs. decoder) during a forward pass. On StartForward errors, it performs an unwind by removing entries from already-started caches.

Usage

Used by multimodal and encoder-decoder models that need both an encoder cache and a causal decoder cache. The model sets the cache type before each layer to route operations to the correct underlying cache.

Code Reference

Source Location

Repository: Ollama
File: kvcache/wrapper.go
Lines: 1-110

Signature

type WrapperCache struct {
    caches  []Cache
    curType int
}

func NewWrapperCache(caches ...Cache) *WrapperCache
func (c *WrapperCache) SetCacheType(cacheType int)
func (c *WrapperCache) Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
func (c *WrapperCache) StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
func (c *WrapperCache) SetLayer(layer int)
func (c *WrapperCache) Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
func (c *WrapperCache) Put(ctx ml.Context, key, value ml.Tensor)
func (c *WrapperCache) Close()

Import

import "github.com/ollama/ollama/kvcache"

I/O Contract

Inputs

Name	Type	Required	Description
caches	...Cache	Yes	One or more Cache implementations to wrap (e.g., EncoderCache + Causal)
cacheType	int	Yes	Index of the cache to activate for subsequent SetLayer/Get/Put calls

Outputs

Name	Type	Description
key	ml.Tensor	Key tensor from the currently active cache
value	ml.Tensor	Value tensor from the currently active cache
mask	ml.Tensor	Mask from the currently active cache

Usage Examples

// Create a wrapper for encoder-decoder model
encCache := kvcache.NewEncoderCache()
decCache := kvcache.NewCausal()
wrapper := kvcache.NewWrapperCache(encCache, decCache)
wrapper.Init(backend, ml.DTypeF16, 4, 2048, 512)

// During forward pass - use encoder cache (index 0)
wrapper.SetCacheType(0)
wrapper.SetLayer(0)
wrapper.Put(ctx, encoderKey, encoderValue)

// Switch to decoder cache (index 1)
wrapper.SetCacheType(1)
wrapper.SetLayer(0)
key, value, mask := wrapper.Get(ctx)

Related Pages

Principle:Ollama_Ollama_KVCache_Composite_Caching

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment