Implementation:Ggml org Llama cpp Memory Recurrent Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Memory, Recurrent
Last Updated	2026-02-15 00:00 GMT

Overview

Declares the recurrent memory class and its processing context for managing SSM/RNN hidden states during inference.

Description

`llama_memory_recurrent` implements `llama_memory_i` with a cell-based storage system. Each `mem_cell` tracks position, source indices for state copying, tail pointer, and associated sequence IDs. The class manages per-layer rolling state tensors (`r_l`) and recurrent state tensors (`s_l`), with a head pointer for placement and usage tracking. `llama_memory_recurrent_context` implements the batch processing protocol with `next()`/`apply()`/`get_ubatch()` methods, providing accessors for the recurrent state tensors and copy source indices needed during graph construction.

Usage

Include this header when working with recurrent model support. It defines the interface that graph builders use to access hidden states for Mamba, RWKV, and hybrid architectures.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-memory-recurrent.h
Lines: 1-182

Signature

class llama_memory_recurrent : public llama_memory_i {
public:
    llama_memory_recurrent(
        const llama_model & model, ggml_type type_r, ggml_type type_s,
        bool offload, uint32_t mem_size, uint32_t n_seq_max,
        const layer_filter_cb & filter);

    llama_memory_context_ptr init_batch(llama_batch_allocr & balloc, uint32_t n_ubatch, bool embd_all) override;
    llama_memory_context_ptr init_full() override;
    llama_memory_context_ptr init_update(llama_context * lctx, bool optimize) override;

    bool prepare(const std::vector<llama_ubatch> & ubatches);
    bool find_slot(const llama_ubatch & ubatch);

    struct mem_cell {
        llama_pos pos;
        int32_t src, src0, tail;
        std::set<llama_seq_id> seq_id;
        bool has_seq_id(const llama_seq_id & id) const;
        bool is_empty() const;
    };

    uint32_t head, size, used, n;
    std::vector<mem_cell> cells;
    std::vector<ggml_tensor *> r_l;  // per-layer rolling state
    std::vector<ggml_tensor *> s_l;  // per-layer recurrent state
};

class llama_memory_recurrent_context : public llama_memory_context_i {
public:
    bool next() override;
    bool apply() override;
    llama_memory_status get_status() const override;
    const llama_ubatch & get_ubatch() const override;

    uint32_t get_n_rs() const;
    uint32_t get_head() const;
    int32_t get_rs_z() const;
    ggml_tensor * get_r_l(int32_t il) const;
    ggml_tensor * get_s_l(int32_t il) const;
    int32_t s_copy(int i) const;
};

Import

#include "llama-memory-recurrent.h"
// Dependencies:
#include "llama-batch.h"
#include "llama-graph.h"
#include "llama-memory.h"
#include <map>
#include <set>
#include <vector>

I/O Contract

Inputs

Name	Type	Required	Description
model	const llama_model &	Yes	Model reference for layer configuration
type_r	ggml_type	Yes	Data type for rolling state tensors (r_l)
type_s	ggml_type	Yes	Data type for recurrent state tensors (s_l)
offload	bool	Yes	Whether to offload state to GPU
mem_size	uint32_t	Yes	Total number of memory cells
n_seq_max	uint32_t	Yes	Maximum number of concurrent sequences
filter	const layer_filter_cb &	Yes	Callback to determine which layers use this memory
ubatches	const std::vector<llama_ubatch> &	Yes	Micro-batches for prepare()

Outputs

Name	Type	Description
r_l	std::vector<ggml_tensor *>	Per-layer rolling state tensors
s_l	std::vector<ggml_tensor *>	Per-layer recurrent state tensors
get_head()	uint32_t	Current head position in the cell array
s_copy(i)	int32_t	Copy source index for state transfer during graph construction

Usage Examples

#include "llama-memory-recurrent.h"

// During graph construction, access recurrent state
const auto * recr_ctx = dynamic_cast<const llama_memory_recurrent_context *>(ctx);
ggml_tensor * r = recr_ctx->get_r_l(layer_id);
ggml_tensor * s = recr_ctx->get_s_l(layer_id);
int32_t src = recr_ctx->s_copy(cell_index);

Related Pages

Principle:Ggml_org_Llama_cpp_RecurrentMemory

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment