Implementation:Ollama Ollama KVCache Encoder

Knowledge Sources	Ollama
Domains	Inference Runtime, KV Cache Management
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the encoder KV cache that stores position-independent key and value tensors for encoder-decoder and cross-attention models, returning cached tensors as-is without masking.

Description

The EncoderCache struct implements the Cache interface for encoder-style models where K/V tensors are position-independent (computed once over the full input, then reused). Unlike the causal cache, it stores tensors per-layer in maps and always returns a nil mask. The cache tracks whether encoded data has been cached and at what position, avoiding redundant recomputation. It supports memory reservation passes (where metadata is not updated) and cleanup of per-layer ML contexts.

Usage

Used for models with encoder components (e.g., vision encoders in multimodal models, or encoder-decoder architectures). Typically wrapped together with a Causal cache inside a WrapperCache for models that have both encoder and decoder components.

Code Reference

Source Location

Repository: Ollama
File: kvcache/encoder.go
Lines: 1-156

Signature

type EncoderCache struct {
    config       *ml.CacheConfig
    curLayer     int
    curPos       int32
    curReserve   bool
    encoderCached bool
    encoderPos   int32
    backend      ml.Backend
    ctxs         map[int]ml.Context
    keys, values map[int]ml.Tensor
}

func NewEncoderCache() *EncoderCache
func (c *EncoderCache) Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
func (c *EncoderCache) StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
func (c *EncoderCache) SetLayer(layer int)
func (c *EncoderCache) Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
func (c *EncoderCache) Put(ctx ml.Context, key, value ml.Tensor)
func (c *EncoderCache) Close()

Import

import "github.com/ollama/ollama/kvcache"

I/O Contract

Inputs

Name	Type	Required	Description
backend	ml.Backend	Yes	ML backend for context creation
key	ml.Tensor	Yes	Encoder key tensor to cache
value	ml.Tensor	Yes	Encoder value tensor to cache
batch	input.Batch	Yes	Input batch (position used for cache invalidation)

Outputs

Name	Type	Description
key	ml.Tensor	Cached encoder key tensor (or nil if not yet cached)
value	ml.Tensor	Cached encoder value tensor (or nil if not yet cached)
mask	ml.Tensor	Always nil for encoder cache

Usage Examples

// Create encoder cache for a vision encoder
encCache := kvcache.NewEncoderCache()
encCache.Init(backend, ml.DTypeF16, 1, 1024, 1)

// Wrap with a causal cache for encoder-decoder model
cache := kvcache.NewWrapperCache(encCache, kvcache.NewCausal())

Related Pages

Principle:Ollama_Ollama_KVCache_Encoder_Caching

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment