Implementation:Ggml org Llama cpp Memory Hybrid ISWA Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Memory, Hybrid
Last Updated	2026-02-15 00:00 GMT

Overview

Declares the hybrid memory class combining iSWA (interleaved Sliding Window Attention) cache with recurrent state memory.

Description

`llama_memory_hybrid_iswa` implements `llama_memory_i` by composing `llama_kv_cache_iswa` (for attention layers with SWA support) and `llama_memory_recurrent` (for recurrent layers). The context class `llama_memory_hybrid_iswa_context` tracks separate slot info vectors for base and SWA attention layers alongside the recurrent context, providing `get_attn()` and `get_recr()` accessors for graph builders to access the appropriate memory for each layer.

Usage

Include this header when working with hybrid architectures that combine sliding window attention, full attention, and recurrent mechanisms across different layers. It supports the most complex hybrid memory configurations in llama.cpp.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-memory-hybrid-iswa.h
Lines: 1-140

Signature

class llama_memory_hybrid_iswa : public llama_memory_i {
public:
    llama_memory_hybrid_iswa(
        const llama_model & model,
        ggml_type type_k, ggml_type type_v, bool v_trans, bool swa_full,
        uint32_t kv_size, uint32_t n_ubatch, uint32_t n_pad,
        ggml_type type_r, ggml_type type_s, uint32_t rs_size,
        uint32_t n_seq_max, bool offload, bool unified,
        const layer_filter_cb & filter_attn = nullptr,
        const layer_filter_cb & filter_recr = nullptr);

    llama_memory_context_ptr init_batch(llama_batch_allocr & balloc, uint32_t n_ubatch, bool embd_all) override;
    llama_memory_context_ptr init_full() override;
    llama_memory_context_ptr init_update(llama_context * lctx, bool optimize) override;

    llama_kv_cache_iswa * get_mem_attn() const;
    llama_memory_recurrent * get_mem_recr() const;
};

class llama_memory_hybrid_iswa_context : public llama_memory_context_i {
public:
    bool next() override;
    bool apply() override;
    llama_memory_status get_status() const override;
    const llama_ubatch & get_ubatch() const override;

    const llama_kv_cache_iswa_context * get_attn() const;
    const llama_memory_recurrent_context * get_recr() const;
};

Import

#include "llama-memory-hybrid-iswa.h"
// Dependencies:
#include "llama-batch.h"
#include "llama-graph.h"
#include "llama-kv-cache-iswa.h"
#include "llama-memory.h"
#include "llama-memory-recurrent.h"
#include <memory>
#include <vector>

I/O Contract

Inputs

Name	Type	Required	Description
model	const llama_model &	Yes	Model with hparams describing layer types (attention vs recurrent)
type_k / type_v	ggml_type	Yes	Key/value cache data types for attention layers
type_r / type_s	ggml_type	Yes	Recurrent state data types
kv_size	uint32_t	Yes	Size of the iSWA KV cache
rs_size	uint32_t	Yes	Size of the recurrent state memory
filter_attn	const layer_filter_cb &	No	Layer filter for attention (default: !is_recurrent)
filter_recr	const layer_filter_cb &	No	Layer filter for recurrent (default: is_recurrent)

Outputs

Name	Type	Description
get_mem_attn()	llama_kv_cache_iswa *	Pointer to the composed iSWA attention cache
get_mem_recr()	llama_memory_recurrent *	Pointer to the composed recurrent memory
get_attn()	const llama_kv_cache_iswa_context *	Attention context for graph building
get_recr()	const llama_memory_recurrent_context *	Recurrent context for graph building

Usage Examples

#include "llama-memory-hybrid-iswa.h"

// Typically created internally by the model initialization code
llama_memory_hybrid_iswa mem(model, type_k, type_v, v_trans, swa_full,
    kv_size, n_ubatch, n_pad, type_r, type_s, rs_size,
    n_seq_max, offload, unified);

// Access sub-memories for inspection
auto * attn_cache = mem.get_mem_attn();
auto * recr_mem   = mem.get_mem_recr();

Related Pages

Principle:Ggml_org_Llama_cpp_HybridMemory

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment