Implementation:Ggml org Llama cpp Graph

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Computation_Graph
Last Updated	2026-02-15 00:00 GMT

Overview

Implements the graph input classes and the `llm_graph_context` that provides shared building blocks for constructing ggml compute graphs across all model architectures.

Description

This file implements `set_input` and `can_reuse` methods for all graph input types: embeddings, positions (including M-RoPE 4D conversion), attention temperature scaling, position buckets, output ID selection, pooling masks, KV cache masks, cross-attention, recurrent state, and sampling inputs. The `llm_graph_context` class provides reusable graph construction primitives including normalization (RMS, LayerNorm, GroupNorm), FFN building (with gating variants like SwiGLU, GeGLU), attention (with and without KV cache), MoE expert routing, and output logit computation. It also handles LoRA adapter application, control vector injection, and backend sampler graph integration.

Usage

Use this module as the foundational graph building infrastructure that all architecture-specific model implementations in `src/models/` use to construct their compute graphs. It is not called directly by end users but is the core engine behind inference graph construction.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-graph.cpp
Lines: 1-2626

Signature

// Graph input set_input methods
void llm_graph_input_embd::set_input(const llama_ubatch * ubatch);
void llm_graph_input_pos::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_temp::set_input(const llama_ubatch * ubatch);
void llm_graph_input_pos_bucket::set_input(const llama_ubatch * ubatch);
void llm_graph_input_out_ids::set_input(const llama_ubatch * ubatch);
void llm_graph_input_mean::set_input(const llama_ubatch * ubatch);
void llm_graph_input_cls::set_input(const llama_ubatch * ubatch);
void llm_graph_input_rs::set_input(const llama_ubatch * ubatch);
void llm_graph_input_cross_embd::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_no_cache::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_kv::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_kv_iswa::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_cross::set_input(const llama_ubatch * ubatch);
void llm_graph_input_mem_hybrid::set_input(const llama_ubatch * ubatch);
void llm_graph_input_sampling::set_input(const llama_ubatch * ubatch);

// Graph input reuse checks
bool llm_graph_input_embd::can_reuse(const llm_graph_params & params);
bool llm_graph_input_pos::can_reuse(const llm_graph_params & params);
bool llm_graph_input_attn_kv::can_reuse(const llm_graph_params & params);

Import

#include "llama-graph.h"
#include "llama-impl.h"
#include "llama-batch.h"
#include "llama-cparams.h"
#include "llama-kv-cache.h"
#include "llama-kv-cache-iswa.h"
#include "llama-memory-hybrid.h"
#include "llama-memory-hybrid-iswa.h"
#include "llama-memory-recurrent.h"

I/O Contract

Inputs

Name	Type	Required	Description
ubatch	const llama_ubatch *	Yes	Micro-batch containing tokens, positions, embeddings, and sequence IDs
params	const llm_graph_params &	Yes	Graph parameters including architecture, hyperparameters, and memory context

Outputs

Name	Type	Description
tensors	ggml_tensor *	Populated input tensors set via backend_tensor_set for compute graph evaluation
can_reuse	bool	Whether existing graph inputs can be reused with new parameters

Usage Examples

// Set embedding inputs for a micro-batch
llm_graph_input_embd input;
input.set_input(&ubatch);

// Check if graph inputs can be reused
bool reusable = input.can_reuse(new_params);

// Position input with M-RoPE 4D conversion
llm_graph_input_pos pos_input;
pos_input.set_input(&ubatch); // automatically handles 4D conversion for M-RoPE

Related Pages

Principle:Ggml_org_Llama_cpp_ComputeGraph

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment