Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama KVCache Interface

From Leeroopedia
Knowledge Sources
Domains Inference Runtime, KV Cache Management
Last Updated 2025-02-15 00:00 GMT

Overview

Defines the Cache interface that all KV cache implementations must satisfy, providing a unified contract for storing and retrieving key-value tensors during model inference.

Description

The cache.go file declares the Cache interface with methods grouped into two categories: model-facing operations (SetLayer, Get, Put, SetConfig) used during forward passes, and cache management operations (Init, StartForward, Remove, Close) used by the runtime scheduler. It also defines sentinel error variables ErrKvCacheFull and ErrNotSupported used across all cache implementations.

Usage

Used as the polymorphic interface throughout the model and runtime code. Model implementations call SetLayer, Get, and Put during inference. The runtime scheduler calls Init, StartForward, and Remove for lifecycle management.

Code Reference

Source Location

  • Repository: Ollama
  • File: kvcache/cache.go
  • Lines: 1-84

Signature

var (
    ErrKvCacheFull  = errors.New("could not find a kv cache slot")
    ErrNotSupported = errors.New("model does not support operation")
)

type Cache interface {
    SetLayer(layer int)
    Get(ctx ml.Context) (ml.Tensor, ml.Tensor, ml.Tensor)
    Put(ctx ml.Context, key, value ml.Tensor)
    SetConfig(ml.CacheConfig)
    Init(backend ml.Backend, dtype ml.DType, maxSequences, capacity, maxBatch int)
    StartForward(ctx ml.Context, batch input.Batch, reserve bool) error
    Remove(seqId int, beginIndex, endIndex int32) error
    Close()
}

Import

import "github.com/ollama/ollama/kvcache"

I/O Contract

Inputs

Name Type Required Description
layer int Yes Layer index to set as active for Get/Put operations
ctx ml.Context Yes ML context for tensor operations
key ml.Tensor Yes Key tensor to store (Put)
value ml.Tensor Yes Value tensor to store (Put)
batch input.Batch Yes Current input batch with positions and sequence IDs

Outputs

Name Type Description
key ml.Tensor Cached key tensor history
value ml.Tensor Cached value tensor history
mask ml.Tensor Attention mask for the cached history
error error ErrKvCacheFull if no slots available, ErrNotSupported for unsupported ops

Usage Examples

// Use the Cache interface polymorphically
func runLayer(cache kvcache.Cache, ctx ml.Context, layer int, k, v ml.Tensor) (ml.Tensor, ml.Tensor, ml.Tensor) {
    cache.SetLayer(layer)
    cache.Put(ctx, k, v)
    return cache.Get(ctx)
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment