Implementation:Ggml org Llama cpp Graph
| Knowledge Sources | |
|---|---|
| Domains | Computation_Graph |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Implements the graph input classes and the `llm_graph_context` that provides shared building blocks for constructing ggml compute graphs across all model architectures.
Description
This file implements `set_input` and `can_reuse` methods for all graph input types: embeddings, positions (including M-RoPE 4D conversion), attention temperature scaling, position buckets, output ID selection, pooling masks, KV cache masks, cross-attention, recurrent state, and sampling inputs. The `llm_graph_context` class provides reusable graph construction primitives including normalization (RMS, LayerNorm, GroupNorm), FFN building (with gating variants like SwiGLU, GeGLU), attention (with and without KV cache), MoE expert routing, and output logit computation. It also handles LoRA adapter application, control vector injection, and backend sampler graph integration.
Usage
Use this module as the foundational graph building infrastructure that all architecture-specific model implementations in `src/models/` use to construct their compute graphs. It is not called directly by end users but is the core engine behind inference graph construction.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-graph.cpp
- Lines: 1-2626
Signature
// Graph input set_input methods
void llm_graph_input_embd::set_input(const llama_ubatch * ubatch);
void llm_graph_input_pos::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_temp::set_input(const llama_ubatch * ubatch);
void llm_graph_input_pos_bucket::set_input(const llama_ubatch * ubatch);
void llm_graph_input_out_ids::set_input(const llama_ubatch * ubatch);
void llm_graph_input_mean::set_input(const llama_ubatch * ubatch);
void llm_graph_input_cls::set_input(const llama_ubatch * ubatch);
void llm_graph_input_rs::set_input(const llama_ubatch * ubatch);
void llm_graph_input_cross_embd::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_no_cache::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_kv::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_kv_iswa::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_cross::set_input(const llama_ubatch * ubatch);
void llm_graph_input_mem_hybrid::set_input(const llama_ubatch * ubatch);
void llm_graph_input_sampling::set_input(const llama_ubatch * ubatch);
// Graph input reuse checks
bool llm_graph_input_embd::can_reuse(const llm_graph_params & params);
bool llm_graph_input_pos::can_reuse(const llm_graph_params & params);
bool llm_graph_input_attn_kv::can_reuse(const llm_graph_params & params);
Import
#include "llama-graph.h"
#include "llama-impl.h"
#include "llama-batch.h"
#include "llama-cparams.h"
#include "llama-kv-cache.h"
#include "llama-kv-cache-iswa.h"
#include "llama-memory-hybrid.h"
#include "llama-memory-hybrid-iswa.h"
#include "llama-memory-recurrent.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ubatch | const llama_ubatch * | Yes | Micro-batch containing tokens, positions, embeddings, and sequence IDs |
| params | const llm_graph_params & | Yes | Graph parameters including architecture, hyperparameters, and memory context |
Outputs
| Name | Type | Description |
|---|---|---|
| tensors | ggml_tensor * | Populated input tensors set via backend_tensor_set for compute graph evaluation |
| can_reuse | bool | Whether existing graph inputs can be reused with new parameters |
Usage Examples
// Set embedding inputs for a micro-batch
llm_graph_input_embd input;
input.set_input(&ubatch);
// Check if graph inputs can be reused
bool reusable = input.can_reuse(new_params);
// Position input with M-RoPE 4D conversion
llm_graph_input_pos pos_input;
pos_input.set_input(&ubatch); // automatically handles 4D conversion for M-RoPE