Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Memory Hybrid ISWA Header

From Leeroopedia
Knowledge Sources
Domains Memory, Hybrid
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the hybrid memory class combining iSWA (interleaved Sliding Window Attention) cache with recurrent state memory.

Description

`llama_memory_hybrid_iswa` implements `llama_memory_i` by composing `llama_kv_cache_iswa` (for attention layers with SWA support) and `llama_memory_recurrent` (for recurrent layers). The context class `llama_memory_hybrid_iswa_context` tracks separate slot info vectors for base and SWA attention layers alongside the recurrent context, providing `get_attn()` and `get_recr()` accessors for graph builders to access the appropriate memory for each layer.

Usage

Include this header when working with hybrid architectures that combine sliding window attention, full attention, and recurrent mechanisms across different layers. It supports the most complex hybrid memory configurations in llama.cpp.

Code Reference

Source Location

Signature

class llama_memory_hybrid_iswa : public llama_memory_i {
public:
    llama_memory_hybrid_iswa(
        const llama_model & model,
        ggml_type type_k, ggml_type type_v, bool v_trans, bool swa_full,
        uint32_t kv_size, uint32_t n_ubatch, uint32_t n_pad,
        ggml_type type_r, ggml_type type_s, uint32_t rs_size,
        uint32_t n_seq_max, bool offload, bool unified,
        const layer_filter_cb & filter_attn = nullptr,
        const layer_filter_cb & filter_recr = nullptr);

    llama_memory_context_ptr init_batch(llama_batch_allocr & balloc, uint32_t n_ubatch, bool embd_all) override;
    llama_memory_context_ptr init_full() override;
    llama_memory_context_ptr init_update(llama_context * lctx, bool optimize) override;

    llama_kv_cache_iswa * get_mem_attn() const;
    llama_memory_recurrent * get_mem_recr() const;
};

class llama_memory_hybrid_iswa_context : public llama_memory_context_i {
public:
    bool next() override;
    bool apply() override;
    llama_memory_status get_status() const override;
    const llama_ubatch & get_ubatch() const override;

    const llama_kv_cache_iswa_context * get_attn() const;
    const llama_memory_recurrent_context * get_recr() const;
};

Import

#include "llama-memory-hybrid-iswa.h"
// Dependencies:
#include "llama-batch.h"
#include "llama-graph.h"
#include "llama-kv-cache-iswa.h"
#include "llama-memory.h"
#include "llama-memory-recurrent.h"
#include <memory>
#include <vector>

I/O Contract

Inputs

Name Type Required Description
model const llama_model & Yes Model with hparams describing layer types (attention vs recurrent)
type_k / type_v ggml_type Yes Key/value cache data types for attention layers
type_r / type_s ggml_type Yes Recurrent state data types
kv_size uint32_t Yes Size of the iSWA KV cache
rs_size uint32_t Yes Size of the recurrent state memory
filter_attn const layer_filter_cb & No Layer filter for attention (default: !is_recurrent)
filter_recr const layer_filter_cb & No Layer filter for recurrent (default: is_recurrent)

Outputs

Name Type Description
get_mem_attn() llama_kv_cache_iswa * Pointer to the composed iSWA attention cache
get_mem_recr() llama_memory_recurrent * Pointer to the composed recurrent memory
get_attn() const llama_kv_cache_iswa_context * Attention context for graph building
get_recr() const llama_memory_recurrent_context * Recurrent context for graph building

Usage Examples

#include "llama-memory-hybrid-iswa.h"

// Typically created internally by the model initialization code
llama_memory_hybrid_iswa mem(model, type_k, type_v, v_trans, swa_full,
    kv_size, n_ubatch, n_pad, type_r, type_s, rs_size,
    n_seq_max, offload, unified);

// Access sub-memories for inspection
auto * attn_cache = mem.get_mem_attn();
auto * recr_mem   = mem.get_mem_recr();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment