Implementation:Ggml org Llama cpp Graph Header
| Knowledge Sources | |
|---|---|
| Domains | Computation_Graph |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares graph types, input classes, graph parameters, result containers, and the `llm_graph_context` builder class for constructing ggml compute graphs.
Description
This header defines the central graph construction framework used by every model architecture. It provides enums for graph types (default/encoder/decoder), FFN operations (SiLU, GELU, SwiGLU, etc.), and normalization types. It declares 15+ `llm_graph_input_*` classes (each a polymorphic `llm_graph_input_i` subclass) for different input data types including embeddings, positions, KV masks, cross-attention, recurrent state, and sampling. The `llm_graph_params` struct bundles all parameters needed for graph construction, `llm_graph_result` holds the output graph and input objects for reuse detection, and `llm_graph_context` provides the builder methods for normalization, FFN, attention, MoE, pooling, and logit computation.
Usage
Include this header when implementing a new model architecture or when working with the inference compute pipeline. It is the central abstraction that all model implementations build against.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-graph.h
- Lines: 1-1021
Signature
// Graph type enums
enum llm_graph_type { LLM_GRAPH_TYPE_DEFAULT, LLM_GRAPH_TYPE_ENCODER, LLM_GRAPH_TYPE_DECODER };
enum llm_ffn_op_type { LLM_FFN_SILU, LLM_FFN_GELU, LLM_FFN_RELU, LLM_FFN_SWIGLU, LLM_FFN_GEGLU, ... };
enum llm_ffn_gate_type { LLM_FFN_SEQ, LLM_FFN_PAR };
enum llm_norm_type { LLM_NORM, LLM_NORM_RMS, LLM_NORM_GROUP };
// Base input interface
class llm_graph_input_i {
virtual void set_input(const llama_ubatch * ubatch) = 0;
virtual bool can_reuse(const llm_graph_params & params);
};
// Input classes (15+ subclasses)
class llm_graph_input_embd : public llm_graph_input_i;
class llm_graph_input_pos : public llm_graph_input_i;
class llm_graph_input_attn_temp : public llm_graph_input_i;
class llm_graph_input_attn_kv : public llm_graph_input_i;
class llm_graph_input_attn_kv_iswa : public llm_graph_input_i;
class llm_graph_input_rs : public llm_graph_input_i;
class llm_graph_input_sampling : public llm_graph_input_i;
// Graph parameters and result
struct llm_graph_params;
class llm_graph_result;
// Cross-attention data
struct llama_cross;
Import
#pragma once
#include "llama-arch.h"
#include "llama-batch.h"
#include "llama-hparams.h"
#include "llama-adapter.h"
#include <cstdint>
#include <vector>
#include <memory>
#include <set>
#include <functional>
#include <map>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ubatch | const llama_ubatch * | Yes | Micro-batch data for setting graph input tensors |
| params | llm_graph_params | Yes | Complete parameter set including arch, hparams, cparams, memory context, and callbacks |
| cb | llm_graph_cb | Yes | Callback function for applying custom logic (e.g., offloading) to each tensor |
Outputs
| Name | Type | Description |
|---|---|---|
| result | llm_graph_result | Contains the built ggml compute graph, output tensors (logits, embeddings), and input references |
| gf | ggml_cgraph * | The compute graph ready for evaluation by the ggml backend |
Usage Examples
// Create graph parameters
llm_graph_params params;
params.arch = LLM_ARCH_LLAMA;
params.hparams = model.hparams;
params.cparams = ctx.cparams;
params.ubatch = ubatch;
params.gtype = LLM_GRAPH_TYPE_DEFAULT;
// Check if existing graph can be reused
if (params.allow_reuse(old_params)) {
// reuse existing graph
}
// Build graph with context (done inside model implementations)
// llm_graph_context gctx(params);
// gctx.build_norm(...);
// gctx.build_ffn(...);