Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama KVCache Encoder

From Leeroopedia
Knowledge Sources
Domains Inference Runtime, KV Cache Management
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the encoder KV cache that stores position-independent key and value tensors for encoder-decoder and cross-attention models, returning cached tensors as-is without masking.

Description

The EncoderCache struct implements the Cache interface for encoder-style models where K/V tensors are position-independent (computed once over the full input, then reused). Unlike the causal cache, it stores tensors per-layer in maps and always returns a nil mask. The cache tracks whether encoded data has been cached and at what position, avoiding redundant recomputation. It supports memory reservation passes (where metadata is not updated) and cleanup of per-layer ML contexts.

Usage

Used for models with encoder components (e.g., vision encoders in multimodal models, or encoder-decoder architectures). Typically wrapped together with a Causal cache inside a WrapperCache for models that have both encoder and decoder components.

Code Reference

Source Location

  • Repository: Ollama
  • File: kvcache/encoder.go
  • Lines: 1-156

Signature

type EncoderCache struct {
    config       *ml.CacheConfig
    curLayer     int
    curPos       int32
    curReserve   bool
    encoderCached bool
    encoderPos   int32
    backend      ml.Backend
    ctxs         map[int]ml.Context
    keys, values map[int]ml.Tensor
}

func NewEncoderCache() *EncoderCache
func (c *EncoderCache) Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
func (c *EncoderCache) StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
func (c *EncoderCache) SetLayer(layer int)
func (c *EncoderCache) Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
func (c *EncoderCache) Put(ctx ml.Context, key, value ml.Tensor)
func (c *EncoderCache) Close()

Import

import "github.com/ollama/ollama/kvcache"

I/O Contract

Inputs

Name Type Required Description
backend ml.Backend Yes ML backend for context creation
key ml.Tensor Yes Encoder key tensor to cache
value ml.Tensor Yes Encoder value tensor to cache
batch input.Batch Yes Input batch (position used for cache invalidation)

Outputs

Name Type Description
key ml.Tensor Cached encoder key tensor (or nil if not yet cached)
value ml.Tensor Cached encoder value tensor (or nil if not yet cached)
mask ml.Tensor Always nil for encoder cache

Usage Examples

// Create encoder cache for a vision encoder
encCache := kvcache.NewEncoderCache()
encCache.Init(backend, ml.DTypeF16, 1, 1024, 1)

// Wrap with a causal cache for encoder-decoder model
cache := kvcache.NewWrapperCache(encCache, kvcache.NewCausal())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment