Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Memory Recurrent Header

From Leeroopedia
Revision as of 12:41, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Memory_Recurrent_Header.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Memory, Recurrent
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the recurrent memory class and its processing context for managing SSM/RNN hidden states during inference.

Description

`llama_memory_recurrent` implements `llama_memory_i` with a cell-based storage system. Each `mem_cell` tracks position, source indices for state copying, tail pointer, and associated sequence IDs. The class manages per-layer rolling state tensors (`r_l`) and recurrent state tensors (`s_l`), with a head pointer for placement and usage tracking. `llama_memory_recurrent_context` implements the batch processing protocol with `next()`/`apply()`/`get_ubatch()` methods, providing accessors for the recurrent state tensors and copy source indices needed during graph construction.

Usage

Include this header when working with recurrent model support. It defines the interface that graph builders use to access hidden states for Mamba, RWKV, and hybrid architectures.

Code Reference

Source Location

Signature

class llama_memory_recurrent : public llama_memory_i {
public:
    llama_memory_recurrent(
        const llama_model & model, ggml_type type_r, ggml_type type_s,
        bool offload, uint32_t mem_size, uint32_t n_seq_max,
        const layer_filter_cb & filter);

    llama_memory_context_ptr init_batch(llama_batch_allocr & balloc, uint32_t n_ubatch, bool embd_all) override;
    llama_memory_context_ptr init_full() override;
    llama_memory_context_ptr init_update(llama_context * lctx, bool optimize) override;

    bool prepare(const std::vector<llama_ubatch> & ubatches);
    bool find_slot(const llama_ubatch & ubatch);

    struct mem_cell {
        llama_pos pos;
        int32_t src, src0, tail;
        std::set<llama_seq_id> seq_id;
        bool has_seq_id(const llama_seq_id & id) const;
        bool is_empty() const;
    };

    uint32_t head, size, used, n;
    std::vector<mem_cell> cells;
    std::vector<ggml_tensor *> r_l;  // per-layer rolling state
    std::vector<ggml_tensor *> s_l;  // per-layer recurrent state
};

class llama_memory_recurrent_context : public llama_memory_context_i {
public:
    bool next() override;
    bool apply() override;
    llama_memory_status get_status() const override;
    const llama_ubatch & get_ubatch() const override;

    uint32_t get_n_rs() const;
    uint32_t get_head() const;
    int32_t get_rs_z() const;
    ggml_tensor * get_r_l(int32_t il) const;
    ggml_tensor * get_s_l(int32_t il) const;
    int32_t s_copy(int i) const;
};

Import

#include "llama-memory-recurrent.h"
// Dependencies:
#include "llama-batch.h"
#include "llama-graph.h"
#include "llama-memory.h"
#include <map>
#include <set>
#include <vector>

I/O Contract

Inputs

Name Type Required Description
model const llama_model & Yes Model reference for layer configuration
type_r ggml_type Yes Data type for rolling state tensors (r_l)
type_s ggml_type Yes Data type for recurrent state tensors (s_l)
offload bool Yes Whether to offload state to GPU
mem_size uint32_t Yes Total number of memory cells
n_seq_max uint32_t Yes Maximum number of concurrent sequences
filter const layer_filter_cb & Yes Callback to determine which layers use this memory
ubatches const std::vector<llama_ubatch> & Yes Micro-batches for prepare()

Outputs

Name Type Description
r_l std::vector<ggml_tensor *> Per-layer rolling state tensors
s_l std::vector<ggml_tensor *> Per-layer recurrent state tensors
get_head() uint32_t Current head position in the cell array
s_copy(i) int32_t Copy source index for state transfer during graph construction

Usage Examples

#include "llama-memory-recurrent.h"

// During graph construction, access recurrent state
const auto * recr_ctx = dynamic_cast<const llama_memory_recurrent_context *>(ctx);
ggml_tensor * r = recr_ctx->get_r_l(layer_id);
ggml_tensor * s = recr_ctx->get_s_l(layer_id);
int32_t src = recr_ctx->s_copy(cell_index);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment