Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Memory Hybrid Types

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Memory Management
Last Updated 2025-02-15 00:00 GMT

Overview

Header declaring the hybrid memory class that combines attention-based (KV cache) and recurrent (SSM) memory for architectures with mixed layer types.

Description

Declares llama_memory_hybrid inheriting from llama_memory_i, with constructor parameters for both attention configuration (key/value types, KV size, SWA settings) and recurrent configuration (R/S types, RS size). Contains mem_attn and mem_recr unique pointers. llama_memory_hybrid_context manages the joint context with separate ctx_attn and ctx_recr contexts, coordinating batch iteration between both memory systems.

Usage

Include this header for implementing or extending hybrid attention/recurrent model support.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-memory-hybrid.h
  • Lines: 1-139

Signature

class llama_memory_hybrid : public llama_memory_i {
public:
    llama_memory_hybrid(
        const llama_model & model,
                ggml_type   type_k, ggml_type type_v, bool v_trans,
                 uint32_t   kv_size, uint32_t n_pad, uint32_t n_swa,
           llama_swa_type   swa_type,
                ggml_type   type_r, ggml_type type_s, uint32_t rs_size,
                 uint32_t   n_seq_max, bool offload, bool unified,
    const layer_filter_cb & filter_attn = nullptr,
    const layer_filter_cb & filter_recr = nullptr);

    llama_kv_cache * get_mem_attn() const;
    llama_memory_recurrent * get_mem_recr() const;
};

class llama_memory_hybrid_context : public llama_memory_context_i {
public:
    const llama_kv_cache_context * get_attn() const;
    const llama_memory_recurrent_context * get_recr() const;
};

Import

#include "llama-memory-hybrid.h"

I/O Contract

Inputs

Name Type Required Description
model const llama_model & Yes Model with hybrid architecture
type_k, type_v ggml_type Yes Attention KV types
type_r, type_s ggml_type Yes Recurrent state types

Outputs

Name Type Description
get_mem_attn() llama_kv_cache* KV cache for attention layers
get_mem_recr() llama_memory_recurrent* Recurrent memory for SSM layers

Usage Examples

#include "llama-memory-hybrid.h"

// The hybrid memory routes layers automatically:
// - attention layers -> KV cache
// - recurrent layers -> recurrent memory
auto hybrid = std::make_unique<llama_memory_hybrid>(
    model, type_k, type_v, v_trans, kv_size, n_pad, 0,
    LLAMA_SWA_TYPE_NONE, type_r, type_s, rs_size,
    n_seq_max, offload, unified);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment