Implementation:Ggml org Llama cpp Memory Recurrent

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Memory, Recurrent
Last Updated	2026-02-15 00:00 GMT

Overview

Implements recurrent state memory management for models like Mamba, RWKV, and other SSM-based architectures.

Description

This file allocates per-layer rolling state (r_l) and recurrent state (s_l) tensors, managing them through a cell-based system where each cell tracks position, sequence IDs, and source indices for state copying. The `find_slot` method locates contiguous free cells for new sequences, handling both initial placement and continuation. State management includes sequence operations (remove, copy, keep, add, divide) that manipulate cell metadata. The `llama_memory_recurrent_context` class orchestrates batch processing by splitting batches into ubatches and tracking which cells need state transfers between processing steps. State serialization supports both full and per-sequence save/load.

Usage

Use this module as the memory backend for non-transformer architectures (Mamba, RWKV, Jamba hybrid layers). Without this component, recurrent/SSM models cannot maintain their hidden states across inference steps.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-memory-recurrent.cpp
Lines: 1-1165

Signature

llama_memory_recurrent::llama_memory_recurrent(
    const llama_model & model,
    ggml_type type_r, ggml_type type_s,
    bool offload, uint32_t mem_size, uint32_t n_seq_max,
    const layer_filter_cb & filter);

// Sequence operations
void llama_memory_recurrent::clear(bool data);
bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos p1);
void llama_memory_recurrent::seq_cp(llama_seq_id seq_id_src, llama_seq_id seq_id_dst, llama_pos p0, llama_pos p1);
void llama_memory_recurrent::seq_keep(llama_seq_id seq_id);
void llama_memory_recurrent::seq_add(llama_seq_id seq_id, llama_pos p0, llama_pos p1, llama_pos delta);
void llama_memory_recurrent::seq_div(llama_seq_id seq_id, llama_pos p0, llama_pos p1, int d);

// State persistence
void llama_memory_recurrent::state_write(/* ... */);
void llama_memory_recurrent::state_read(/* ... */);

Import

#include "llama-memory-recurrent.h"
#include "llama-impl.h"
#include "llama-io.h"
#include "llama-batch.h"
#include "llama-model.h"

I/O Contract

Inputs

Name	Type	Required	Description
model	const llama_model &	Yes	Model defining layer structure and device assignments
type_r	ggml_type	Yes	Data type for rolling state tensors
type_s	ggml_type	Yes	Data type for recurrent state tensors
mem_size	uint32_t	Yes	Number of memory cells to allocate
n_seq_max	uint32_t	Yes	Maximum number of concurrent sequences
filter	const layer_filter_cb &	No	Optional callback to filter which layers are included

Outputs

Name	Type	Description
r_l	std::vector<ggml_tensor *>	Per-layer rolling state tensors
s_l	std::vector<ggml_tensor *>	Per-layer recurrent state tensors
cells	std::vector<cell>	Cell metadata tracking positions, sequences, and state sources

Usage Examples

// Create recurrent memory for Mamba model
auto mem = std::make_unique<llama_memory_recurrent>(
    model, GGML_TYPE_F32, GGML_TYPE_F32,
    /*offload=*/true, mem_size, n_seq_max,
    nullptr);

// Sequence management
mem->seq_rm(seq_id, 0, -1);   // remove sequence
mem->seq_cp(0, 1, 0, -1);     // copy sequence state
mem->seq_keep(seq_id);         // keep only this sequence
mem->clear(true);              // clear all state

Related Pages

Principle:Ggml_org_Llama_cpp_RecurrentMemory

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment