Implementation:Ggml org Llama cpp Memory Recurrent Header
| Knowledge Sources | |
|---|---|
| Domains | Memory, Recurrent |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the recurrent memory class and its processing context for managing SSM/RNN hidden states during inference.
Description
`llama_memory_recurrent` implements `llama_memory_i` with a cell-based storage system. Each `mem_cell` tracks position, source indices for state copying, tail pointer, and associated sequence IDs. The class manages per-layer rolling state tensors (`r_l`) and recurrent state tensors (`s_l`), with a head pointer for placement and usage tracking. `llama_memory_recurrent_context` implements the batch processing protocol with `next()`/`apply()`/`get_ubatch()` methods, providing accessors for the recurrent state tensors and copy source indices needed during graph construction.
Usage
Include this header when working with recurrent model support. It defines the interface that graph builders use to access hidden states for Mamba, RWKV, and hybrid architectures.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-memory-recurrent.h
- Lines: 1-182
Signature
class llama_memory_recurrent : public llama_memory_i {
public:
llama_memory_recurrent(
const llama_model & model, ggml_type type_r, ggml_type type_s,
bool offload, uint32_t mem_size, uint32_t n_seq_max,
const layer_filter_cb & filter);
llama_memory_context_ptr init_batch(llama_batch_allocr & balloc, uint32_t n_ubatch, bool embd_all) override;
llama_memory_context_ptr init_full() override;
llama_memory_context_ptr init_update(llama_context * lctx, bool optimize) override;
bool prepare(const std::vector<llama_ubatch> & ubatches);
bool find_slot(const llama_ubatch & ubatch);
struct mem_cell {
llama_pos pos;
int32_t src, src0, tail;
std::set<llama_seq_id> seq_id;
bool has_seq_id(const llama_seq_id & id) const;
bool is_empty() const;
};
uint32_t head, size, used, n;
std::vector<mem_cell> cells;
std::vector<ggml_tensor *> r_l; // per-layer rolling state
std::vector<ggml_tensor *> s_l; // per-layer recurrent state
};
class llama_memory_recurrent_context : public llama_memory_context_i {
public:
bool next() override;
bool apply() override;
llama_memory_status get_status() const override;
const llama_ubatch & get_ubatch() const override;
uint32_t get_n_rs() const;
uint32_t get_head() const;
int32_t get_rs_z() const;
ggml_tensor * get_r_l(int32_t il) const;
ggml_tensor * get_s_l(int32_t il) const;
int32_t s_copy(int i) const;
};
Import
#include "llama-memory-recurrent.h"
// Dependencies:
#include "llama-batch.h"
#include "llama-graph.h"
#include "llama-memory.h"
#include <map>
#include <set>
#include <vector>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model & | Yes | Model reference for layer configuration |
| type_r | ggml_type | Yes | Data type for rolling state tensors (r_l) |
| type_s | ggml_type | Yes | Data type for recurrent state tensors (s_l) |
| offload | bool | Yes | Whether to offload state to GPU |
| mem_size | uint32_t | Yes | Total number of memory cells |
| n_seq_max | uint32_t | Yes | Maximum number of concurrent sequences |
| filter | const layer_filter_cb & | Yes | Callback to determine which layers use this memory |
| ubatches | const std::vector<llama_ubatch> & | Yes | Micro-batches for prepare() |
Outputs
| Name | Type | Description |
|---|---|---|
| r_l | std::vector<ggml_tensor *> | Per-layer rolling state tensors |
| s_l | std::vector<ggml_tensor *> | Per-layer recurrent state tensors |
| get_head() | uint32_t | Current head position in the cell array |
| s_copy(i) | int32_t | Copy source index for state transfer during graph construction |
Usage Examples
#include "llama-memory-recurrent.h"
// During graph construction, access recurrent state
const auto * recr_ctx = dynamic_cast<const llama_memory_recurrent_context *>(ctx);
ggml_tensor * r = recr_ctx->get_r_l(layer_id);
ggml_tensor * s = recr_ctx->get_s_l(layer_id);
int32_t src = recr_ctx->s_copy(cell_index);