Implementation:Ollama Ollama KVCache Wrapper
| Knowledge Sources | |
|---|---|
| Domains | Inference Runtime, KV Cache Management |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements a composite KV cache container that wraps multiple cache implementations, dispatching operations to the appropriate cache based on the active cache type set by the model.
Description
The WrapperCache struct holds a slice of Cache implementations and a curType index that determines which cache receives SetLayer, Get, and Put calls. Lifecycle operations (Init, SetConfig, Close, StartForward, Remove) are forwarded to all wrapped caches. The SetCacheType method allows models to switch between caches (e.g., encoder vs. decoder) during a forward pass. On StartForward errors, it performs an unwind by removing entries from already-started caches.
Usage
Used by multimodal and encoder-decoder models that need both an encoder cache and a causal decoder cache. The model sets the cache type before each layer to route operations to the correct underlying cache.
Code Reference
Source Location
- Repository: Ollama
- File: kvcache/wrapper.go
- Lines: 1-110
Signature
type WrapperCache struct {
caches []Cache
curType int
}
func NewWrapperCache(caches ...Cache) *WrapperCache
func (c *WrapperCache) SetCacheType(cacheType int)
func (c *WrapperCache) Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
func (c *WrapperCache) StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
func (c *WrapperCache) SetLayer(layer int)
func (c *WrapperCache) Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
func (c *WrapperCache) Put(ctx ml.Context, key, value ml.Tensor)
func (c *WrapperCache) Close()
Import
import "github.com/ollama/ollama/kvcache"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| caches | ...Cache | Yes | One or more Cache implementations to wrap (e.g., EncoderCache + Causal) |
| cacheType | int | Yes | Index of the cache to activate for subsequent SetLayer/Get/Put calls |
Outputs
| Name | Type | Description |
|---|---|---|
| key | ml.Tensor | Key tensor from the currently active cache |
| value | ml.Tensor | Value tensor from the currently active cache |
| mask | ml.Tensor | Mask from the currently active cache |
Usage Examples
// Create a wrapper for encoder-decoder model
encCache := kvcache.NewEncoderCache()
decCache := kvcache.NewCausal()
wrapper := kvcache.NewWrapperCache(encCache, decCache)
wrapper.Init(backend, ml.DTypeF16, 4, 2048, 512)
// During forward pass - use encoder cache (index 0)
wrapper.SetCacheType(0)
wrapper.SetLayer(0)
wrapper.Put(ctx, encoderKey, encoderValue)
// Switch to decoder cache (index 1)
wrapper.SetCacheType(1)
wrapper.SetLayer(0)
key, value, mask := wrapper.Get(ctx)