Implementation:Ollama Ollama KVCache Encoder
| Knowledge Sources | |
|---|---|
| Domains | Inference Runtime, KV Cache Management |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the encoder KV cache that stores position-independent key and value tensors for encoder-decoder and cross-attention models, returning cached tensors as-is without masking.
Description
The EncoderCache struct implements the Cache interface for encoder-style models where K/V tensors are position-independent (computed once over the full input, then reused). Unlike the causal cache, it stores tensors per-layer in maps and always returns a nil mask. The cache tracks whether encoded data has been cached and at what position, avoiding redundant recomputation. It supports memory reservation passes (where metadata is not updated) and cleanup of per-layer ML contexts.
Usage
Used for models with encoder components (e.g., vision encoders in multimodal models, or encoder-decoder architectures). Typically wrapped together with a Causal cache inside a WrapperCache for models that have both encoder and decoder components.
Code Reference
Source Location
- Repository: Ollama
- File: kvcache/encoder.go
- Lines: 1-156
Signature
type EncoderCache struct {
config *ml.CacheConfig
curLayer int
curPos int32
curReserve bool
encoderCached bool
encoderPos int32
backend ml.Backend
ctxs map[int]ml.Context
keys, values map[int]ml.Tensor
}
func NewEncoderCache() *EncoderCache
func (c *EncoderCache) Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
func (c *EncoderCache) StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
func (c *EncoderCache) SetLayer(layer int)
func (c *EncoderCache) Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
func (c *EncoderCache) Put(ctx ml.Context, key, value ml.Tensor)
func (c *EncoderCache) Close()
Import
import "github.com/ollama/ollama/kvcache"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| backend | ml.Backend | Yes | ML backend for context creation |
| key | ml.Tensor | Yes | Encoder key tensor to cache |
| value | ml.Tensor | Yes | Encoder value tensor to cache |
| batch | input.Batch | Yes | Input batch (position used for cache invalidation) |
Outputs
| Name | Type | Description |
|---|---|---|
| key | ml.Tensor | Cached encoder key tensor (or nil if not yet cached) |
| value | ml.Tensor | Cached encoder value tensor (or nil if not yet cached) |
| mask | ml.Tensor | Always nil for encoder cache |
Usage Examples
// Create encoder cache for a vision encoder
encCache := kvcache.NewEncoderCache()
encCache.Init(backend, ml.DTypeF16, 1, 1024, 1)
// Wrap with a causal cache for encoder-decoder model
cache := kvcache.NewWrapperCache(encCache, kvcache.NewCausal())