Implementation:Ggml org Llama cpp Memory Recurrent
| Knowledge Sources | |
|---|---|
| Domains | Memory, Recurrent |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Implements recurrent state memory management for models like Mamba, RWKV, and other SSM-based architectures.
Description
This file allocates per-layer rolling state (r_l) and recurrent state (s_l) tensors, managing them through a cell-based system where each cell tracks position, sequence IDs, and source indices for state copying. The `find_slot` method locates contiguous free cells for new sequences, handling both initial placement and continuation. State management includes sequence operations (remove, copy, keep, add, divide) that manipulate cell metadata. The `llama_memory_recurrent_context` class orchestrates batch processing by splitting batches into ubatches and tracking which cells need state transfers between processing steps. State serialization supports both full and per-sequence save/load.
Usage
Use this module as the memory backend for non-transformer architectures (Mamba, RWKV, Jamba hybrid layers). Without this component, recurrent/SSM models cannot maintain their hidden states across inference steps.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-memory-recurrent.cpp
- Lines: 1-1165
Signature
llama_memory_recurrent::llama_memory_recurrent(
const llama_model & model,
ggml_type type_r, ggml_type type_s,
bool offload, uint32_t mem_size, uint32_t n_seq_max,
const layer_filter_cb & filter);
// Sequence operations
void llama_memory_recurrent::clear(bool data);
bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos p1);
void llama_memory_recurrent::seq_cp(llama_seq_id seq_id_src, llama_seq_id seq_id_dst, llama_pos p0, llama_pos p1);
void llama_memory_recurrent::seq_keep(llama_seq_id seq_id);
void llama_memory_recurrent::seq_add(llama_seq_id seq_id, llama_pos p0, llama_pos p1, llama_pos delta);
void llama_memory_recurrent::seq_div(llama_seq_id seq_id, llama_pos p0, llama_pos p1, int d);
// State persistence
void llama_memory_recurrent::state_write(/* ... */);
void llama_memory_recurrent::state_read(/* ... */);
Import
#include "llama-memory-recurrent.h"
#include "llama-impl.h"
#include "llama-io.h"
#include "llama-batch.h"
#include "llama-model.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model & | Yes | Model defining layer structure and device assignments |
| type_r | ggml_type | Yes | Data type for rolling state tensors |
| type_s | ggml_type | Yes | Data type for recurrent state tensors |
| mem_size | uint32_t | Yes | Number of memory cells to allocate |
| n_seq_max | uint32_t | Yes | Maximum number of concurrent sequences |
| filter | const layer_filter_cb & | No | Optional callback to filter which layers are included |
Outputs
| Name | Type | Description |
|---|---|---|
| r_l | std::vector<ggml_tensor *> | Per-layer rolling state tensors |
| s_l | std::vector<ggml_tensor *> | Per-layer recurrent state tensors |
| cells | std::vector<cell> | Cell metadata tracking positions, sequences, and state sources |
Usage Examples
// Create recurrent memory for Mamba model
auto mem = std::make_unique<llama_memory_recurrent>(
model, GGML_TYPE_F32, GGML_TYPE_F32,
/*offload=*/true, mem_size, n_seq_max,
nullptr);
// Sequence management
mem->seq_rm(seq_id, 0, -1); // remove sequence
mem->seq_cp(0, 1, 0, -1); // copy sequence state
mem->seq_keep(seq_id); // keep only this sequence
mem->clear(true); // clear all state