Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Graph

From Leeroopedia
Knowledge Sources
Domains Computation_Graph
Last Updated 2026-02-15 00:00 GMT

Overview

Implements the graph input classes and the `llm_graph_context` that provides shared building blocks for constructing ggml compute graphs across all model architectures.

Description

This file implements `set_input` and `can_reuse` methods for all graph input types: embeddings, positions (including M-RoPE 4D conversion), attention temperature scaling, position buckets, output ID selection, pooling masks, KV cache masks, cross-attention, recurrent state, and sampling inputs. The `llm_graph_context` class provides reusable graph construction primitives including normalization (RMS, LayerNorm, GroupNorm), FFN building (with gating variants like SwiGLU, GeGLU), attention (with and without KV cache), MoE expert routing, and output logit computation. It also handles LoRA adapter application, control vector injection, and backend sampler graph integration.

Usage

Use this module as the foundational graph building infrastructure that all architecture-specific model implementations in `src/models/` use to construct their compute graphs. It is not called directly by end users but is the core engine behind inference graph construction.

Code Reference

Source Location

Signature

// Graph input set_input methods
void llm_graph_input_embd::set_input(const llama_ubatch * ubatch);
void llm_graph_input_pos::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_temp::set_input(const llama_ubatch * ubatch);
void llm_graph_input_pos_bucket::set_input(const llama_ubatch * ubatch);
void llm_graph_input_out_ids::set_input(const llama_ubatch * ubatch);
void llm_graph_input_mean::set_input(const llama_ubatch * ubatch);
void llm_graph_input_cls::set_input(const llama_ubatch * ubatch);
void llm_graph_input_rs::set_input(const llama_ubatch * ubatch);
void llm_graph_input_cross_embd::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_no_cache::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_kv::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_kv_iswa::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_cross::set_input(const llama_ubatch * ubatch);
void llm_graph_input_mem_hybrid::set_input(const llama_ubatch * ubatch);
void llm_graph_input_sampling::set_input(const llama_ubatch * ubatch);

// Graph input reuse checks
bool llm_graph_input_embd::can_reuse(const llm_graph_params & params);
bool llm_graph_input_pos::can_reuse(const llm_graph_params & params);
bool llm_graph_input_attn_kv::can_reuse(const llm_graph_params & params);

Import

#include "llama-graph.h"
#include "llama-impl.h"
#include "llama-batch.h"
#include "llama-cparams.h"
#include "llama-kv-cache.h"
#include "llama-kv-cache-iswa.h"
#include "llama-memory-hybrid.h"
#include "llama-memory-hybrid-iswa.h"
#include "llama-memory-recurrent.h"

I/O Contract

Inputs

Name Type Required Description
ubatch const llama_ubatch * Yes Micro-batch containing tokens, positions, embeddings, and sequence IDs
params const llm_graph_params & Yes Graph parameters including architecture, hyperparameters, and memory context

Outputs

Name Type Description
tensors ggml_tensor * Populated input tensors set via backend_tensor_set for compute graph evaluation
can_reuse bool Whether existing graph inputs can be reused with new parameters

Usage Examples

// Set embedding inputs for a micro-batch
llm_graph_input_embd input;
input.set_input(&ubatch);

// Check if graph inputs can be reused
bool reusable = input.can_reuse(new_params);

// Position input with M-RoPE 4D conversion
llm_graph_input_pos pos_input;
pos_input.set_input(&ubatch); // automatically handles 4D conversion for M-RoPE

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment