Implementation:Ollama Ollama KVCache Interface
| Knowledge Sources | |
|---|---|
| Domains | Inference Runtime, KV Cache Management |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Defines the Cache interface that all KV cache implementations must satisfy, providing a unified contract for storing and retrieving key-value tensors during model inference.
Description
The cache.go file declares the Cache interface with methods grouped into two categories: model-facing operations (SetLayer, Get, Put, SetConfig) used during forward passes, and cache management operations (Init, StartForward, Remove, Close) used by the runtime scheduler. It also defines sentinel error variables ErrKvCacheFull and ErrNotSupported used across all cache implementations.
Usage
Used as the polymorphic interface throughout the model and runtime code. Model implementations call SetLayer, Get, and Put during inference. The runtime scheduler calls Init, StartForward, and Remove for lifecycle management.
Code Reference
Source Location
- Repository: Ollama
- File: kvcache/cache.go
- Lines: 1-84
Signature
var (
ErrKvCacheFull = errors.New("could not find a kv cache slot")
ErrNotSupported = errors.New("model does not support operation")
)
type Cache interface {
SetLayer(layer int)
Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
Put(ctx ml.Context, key, value ml.Tensor)
SetConfig(ml.CacheConfig)
Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
Remove(seqId int, beginIndex, endIndex int32) error
Close()
}
Import
import "github.com/ollama/ollama/kvcache"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| layer | int | Yes | Layer index to set as active for Get/Put operations |
| ctx | ml.Context | Yes | ML context for tensor operations |
| key | ml.Tensor | Yes | Key tensor to store (Put) |
| value | ml.Tensor | Yes | Value tensor to store (Put) |
| batch | input.Batch | Yes | Current input batch with positions and sequence IDs |
Outputs
| Name | Type | Description |
|---|---|---|
| key | ml.Tensor | Cached key tensor history |
| value | ml.Tensor | Cached value tensor history |
| mask | ml.Tensor | Attention mask for the cached history |
| error | error | ErrKvCacheFull if no slots available, ErrNotSupported for unsupported ops |
Usage Examples
// Use the Cache interface polymorphically
func runLayer(cache kvcache.Cache, ctx ml.Context, layer int, k, v ml.Tensor) (ml.Tensor, ml.Tensor, ml.Tensor) {
cache.SetLayer(layer)
cache.Put(ctx, k, v)
return cache.Get(ctx)
}