Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Memory Recurrent

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Memory Management
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the recurrent state memory system for SSM-based models (Mamba, RWKV), managing per-layer recurrent state and short convolution cache tensors.

Description

The constructor allocates per-layer R (recurrent state) and S (short convolution) tensors with appropriate backend buffer types. Implements find_slot for placing batches into memory cells based on sequence IDs with slot eviction when full. Manages cell metadata (sequence IDs, positions, source tracking) for state routing during inference. The llama_memory_recurrent_context class manages batch-level state and provides graph input preparation.

Usage

Used for recurrent/SSM architectures (Mamba, RWKV, Griffin) that use running hidden states instead of KV caches. These models have O(1) memory per step rather than O(n) like attention, making them efficient for very long contexts.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-memory-recurrent.cpp
  • Lines: 1-1167

Signature

llama_memory_recurrent::llama_memory_recurrent(
        const llama_model & model,
                ggml_type   type_r,
                ggml_type   type_s,
                     bool   offload,
                 uint32_t   mem_size,
                 uint32_t   n_seq_max,
    const layer_filter_cb & filter);

void llama_memory_recurrent::clear(bool data);
bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos p1);
bool llama_memory_recurrent::find_slot(const llama_ubatch & ubatch);
bool llama_memory_recurrent::prepare(const std::vector<llama_ubatch> & ubatches);

Import

#include "llama-memory-recurrent.h"

I/O Contract

Inputs

Name Type Required Description
model const llama_model & Yes Model providing layer config and device info
type_r ggml_type Yes Data type for recurrent state tensors
type_s ggml_type Yes Data type for convolution state tensors
mem_size uint32_t Yes Total number of memory cells
n_seq_max uint32_t Yes Maximum concurrent sequences

Outputs

Name Type Description
r_l std::vector<ggml_tensor*> Per-layer recurrent state tensors
s_l std::vector<ggml_tensor*> Per-layer convolution state tensors

Usage Examples

// Created internally for Mamba/RWKV models
auto mem = std::make_unique<llama_memory_recurrent>(
    model, type_r, type_s, offload, mem_size, n_seq_max, filter);

// Prepare ubatches
bool ok = mem->prepare(ubatches);

// Access state tensors
ggml_tensor * r = ctx->get_r_l(il);
ggml_tensor * s = ctx->get_s_l(il);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment