Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Memory Recurrent

From Leeroopedia
Revision as of 12:41, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Memory_Recurrent.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Memory, Recurrent
Last Updated 2026-02-15 00:00 GMT

Overview

Implements recurrent state memory management for models like Mamba, RWKV, and other SSM-based architectures.

Description

This file allocates per-layer rolling state (r_l) and recurrent state (s_l) tensors, managing them through a cell-based system where each cell tracks position, sequence IDs, and source indices for state copying. The `find_slot` method locates contiguous free cells for new sequences, handling both initial placement and continuation. State management includes sequence operations (remove, copy, keep, add, divide) that manipulate cell metadata. The `llama_memory_recurrent_context` class orchestrates batch processing by splitting batches into ubatches and tracking which cells need state transfers between processing steps. State serialization supports both full and per-sequence save/load.

Usage

Use this module as the memory backend for non-transformer architectures (Mamba, RWKV, Jamba hybrid layers). Without this component, recurrent/SSM models cannot maintain their hidden states across inference steps.

Code Reference

Source Location

Signature

llama_memory_recurrent::llama_memory_recurrent(
    const llama_model & model,
    ggml_type type_r, ggml_type type_s,
    bool offload, uint32_t mem_size, uint32_t n_seq_max,
    const layer_filter_cb & filter);

// Sequence operations
void llama_memory_recurrent::clear(bool data);
bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos p1);
void llama_memory_recurrent::seq_cp(llama_seq_id seq_id_src, llama_seq_id seq_id_dst, llama_pos p0, llama_pos p1);
void llama_memory_recurrent::seq_keep(llama_seq_id seq_id);
void llama_memory_recurrent::seq_add(llama_seq_id seq_id, llama_pos p0, llama_pos p1, llama_pos delta);
void llama_memory_recurrent::seq_div(llama_seq_id seq_id, llama_pos p0, llama_pos p1, int d);

// State persistence
void llama_memory_recurrent::state_write(/* ... */);
void llama_memory_recurrent::state_read(/* ... */);

Import

#include "llama-memory-recurrent.h"
#include "llama-impl.h"
#include "llama-io.h"
#include "llama-batch.h"
#include "llama-model.h"

I/O Contract

Inputs

Name Type Required Description
model const llama_model & Yes Model defining layer structure and device assignments
type_r ggml_type Yes Data type for rolling state tensors
type_s ggml_type Yes Data type for recurrent state tensors
mem_size uint32_t Yes Number of memory cells to allocate
n_seq_max uint32_t Yes Maximum number of concurrent sequences
filter const layer_filter_cb & No Optional callback to filter which layers are included

Outputs

Name Type Description
r_l std::vector<ggml_tensor *> Per-layer rolling state tensors
s_l std::vector<ggml_tensor *> Per-layer recurrent state tensors
cells std::vector<cell> Cell metadata tracking positions, sequences, and state sources

Usage Examples

// Create recurrent memory for Mamba model
auto mem = std::make_unique<llama_memory_recurrent>(
    model, GGML_TYPE_F32, GGML_TYPE_F32,
    /*offload=*/true, mem_size, n_seq_max,
    nullptr);

// Sequence management
mem->seq_rm(seq_id, 0, -1);   // remove sequence
mem->seq_cp(0, 1, 0, -1);     // copy sequence state
mem->seq_keep(seq_id);         // keep only this sequence
mem->clear(true);              // clear all state

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment