Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Model Granite Hybrid

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Model Architecture
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the ggml computation graph builder for the IBM Granite Hybrid architecture, which combines attention and Mamba2 state-space model layers.

Description

The llm_build_granite_hybrid constructor extends llm_graph_context_mamba and builds a graph combining transformer attention layers with Mamba2 recurrent layers. Uses hparams.is_recurrent(il) to decide per-layer whether to apply self-attention (with optional RoPE) or SSM processing via build_mamba2_layer. Also applies Granite-specific logit and embedding scaling. Includes a helper method build_attention_layer for the attention sub-graph and uses hybrid memory for managing both KV cache and recurrent state.

Usage

Enables Ollama to run IBM Granite Hybrid models that mix attention with state-space model layers for efficient long-context inference.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/models/granite-hybrid.cpp
  • Lines: 1-196

Signature

llm_build_granite_hybrid::llm_build_granite_hybrid(
    const llama_model & model,
    const llm_graph_params & params) : llm_graph_context_mamba(params);

// Private helper:
ggml_tensor * build_attention_layer(
    ggml_tensor * cur, ggml_tensor * inp_pos,
    llm_graph_input_attn * inp_attn,
    const llama_model & model, int64_t n_embd_head, int il);

Import

#include "models.h"

I/O Contract

Inputs

Name Type Required Description
model const llama_model & Yes Loaded model with Granite Hybrid weights
params const llm_graph_params & Yes Graph construction parameters with hybrid memory context

Outputs

Name Type Description
ggml graph ggml_cgraph Hybrid computation graph mixing attention and SSM layers

Usage Examples

auto builder = llm_build_granite_hybrid(model, params);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment