Implementation:Ollama Ollama Llama Memory Hybrid Types
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Memory Management |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Header declaring the hybrid memory class that combines attention-based (KV cache) and recurrent (SSM) memory for architectures with mixed layer types.
Description
Declares llama_memory_hybrid inheriting from llama_memory_i, with constructor parameters for both attention configuration (key/value types, KV size, SWA settings) and recurrent configuration (R/S types, RS size). Contains mem_attn and mem_recr unique pointers. llama_memory_hybrid_context manages the joint context with separate ctx_attn and ctx_recr contexts, coordinating batch iteration between both memory systems.
Usage
Include this header for implementing or extending hybrid attention/recurrent model support.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-memory-hybrid.h - Lines: 1-139
Signature
class llama_memory_hybrid : public llama_memory_i {
public:
llama_memory_hybrid(
const llama_model & model,
ggml_type type_k, ggml_type type_v, bool v_trans,
uint32_t kv_size, uint32_t n_pad, uint32_t n_swa,
llama_swa_type swa_type,
ggml_type type_r, ggml_type type_s, uint32_t rs_size,
uint32_t n_seq_max, bool offload, bool unified,
const layer_filter_cb & filter_attn = nullptr,
const layer_filter_cb & filter_recr = nullptr);
llama_kv_cache * get_mem_attn() const;
llama_memory_recurrent * get_mem_recr() const;
};
class llama_memory_hybrid_context : public llama_memory_context_i {
public:
const llama_kv_cache_context * get_attn() const;
const llama_memory_recurrent_context * get_recr() const;
};
Import
#include "llama-memory-hybrid.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model & | Yes | Model with hybrid architecture |
| type_k, type_v | ggml_type | Yes | Attention KV types |
| type_r, type_s | ggml_type | Yes | Recurrent state types |
Outputs
| Name | Type | Description |
|---|---|---|
| get_mem_attn() | llama_kv_cache* | KV cache for attention layers |
| get_mem_recr() | llama_memory_recurrent* | Recurrent memory for SSM layers |
Usage Examples
#include "llama-memory-hybrid.h"
// The hybrid memory routes layers automatically:
// - attention layers -> KV cache
// - recurrent layers -> recurrent memory
auto hybrid = std::make_unique<llama_memory_hybrid>(
model, type_k, type_v, v_trans, kv_size, n_pad, 0,
LLAMA_SWA_TYPE_NONE, type_r, type_s, rs_size,
n_seq_max, offload, unified);