Implementation:Ollama Ollama Llama Memory Hybrid Types

Knowledge Sources	Ollama
Domains	LLM Inference, Memory Management
Last Updated	2025-02-15 00:00 GMT

Overview

Header declaring the hybrid memory class that combines attention-based (KV cache) and recurrent (SSM) memory for architectures with mixed layer types.

Description

Declares llama_memory_hybrid inheriting from llama_memory_i, with constructor parameters for both attention configuration (key/value types, KV size, SWA settings) and recurrent configuration (R/S types, RS size). Contains mem_attn and mem_recr unique pointers. llama_memory_hybrid_context manages the joint context with separate ctx_attn and ctx_recr contexts, coordinating batch iteration between both memory systems.

Usage

Include this header for implementing or extending hybrid attention/recurrent model support.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-memory-hybrid.h
Lines: 1-139

Signature

class llama_memory_hybrid : public llama_memory_i {
public:
    llama_memory_hybrid(
        const llama_model & model,
                ggml_type   type_k, ggml_type type_v, bool v_trans,
                 uint32_t   kv_size, uint32_t n_pad, uint32_t n_swa,
           llama_swa_type   swa_type,
                ggml_type   type_r, ggml_type type_s, uint32_t rs_size,
                 uint32_t   n_seq_max, bool offload, bool unified,
    const layer_filter_cb & filter_attn = nullptr,
    const layer_filter_cb & filter_recr = nullptr);

    llama_kv_cache * get_mem_attn() const;
    llama_memory_recurrent * get_mem_recr() const;
};

class llama_memory_hybrid_context : public llama_memory_context_i {
public:
    const llama_kv_cache_context * get_attn() const;
    const llama_memory_recurrent_context * get_recr() const;
};

Import

#include "llama-memory-hybrid.h"

I/O Contract

Inputs

Name	Type	Required	Description
model	const llama_model &	Yes	Model with hybrid architecture
type_k, type_v	ggml_type	Yes	Attention KV types
type_r, type_s	ggml_type	Yes	Recurrent state types

Outputs

Name	Type	Description
get_mem_attn()	llama_kv_cache*	KV cache for attention layers
get_mem_recr()	llama_memory_recurrent*	Recurrent memory for SSM layers

Usage Examples

#include "llama-memory-hybrid.h"

// The hybrid memory routes layers automatically:
// - attention layers -> KV cache
// - recurrent layers -> recurrent memory
auto hybrid = std::make_unique<llama_memory_hybrid>(
    model, type_k, type_v, v_trans, kv_size, n_pad, 0,
    LLAMA_SWA_TYPE_NONE, type_r, type_s, rs_size,
    n_seq_max, offload, unified);

Related Pages

Principle:Ollama_Ollama_LLM_Memory_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment