Implementation:Ggml org Llama cpp Graph Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Computation_Graph
Last Updated	2026-02-15 00:00 GMT

Overview

Declares graph types, input classes, graph parameters, result containers, and the `llm_graph_context` builder class for constructing ggml compute graphs.

Description

This header defines the central graph construction framework used by every model architecture. It provides enums for graph types (default/encoder/decoder), FFN operations (SiLU, GELU, SwiGLU, etc.), and normalization types. It declares 15+ `llm_graph_input_*` classes (each a polymorphic `llm_graph_input_i` subclass) for different input data types including embeddings, positions, KV masks, cross-attention, recurrent state, and sampling. The `llm_graph_params` struct bundles all parameters needed for graph construction, `llm_graph_result` holds the output graph and input objects for reuse detection, and `llm_graph_context` provides the builder methods for normalization, FFN, attention, MoE, pooling, and logit computation.

Usage

Include this header when implementing a new model architecture or when working with the inference compute pipeline. It is the central abstraction that all model implementations build against.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-graph.h
Lines: 1-1021

Signature

// Graph type enums
enum llm_graph_type { LLM_GRAPH_TYPE_DEFAULT, LLM_GRAPH_TYPE_ENCODER, LLM_GRAPH_TYPE_DECODER };
enum llm_ffn_op_type { LLM_FFN_SILU, LLM_FFN_GELU, LLM_FFN_RELU, LLM_FFN_SWIGLU, LLM_FFN_GEGLU, ... };
enum llm_ffn_gate_type { LLM_FFN_SEQ, LLM_FFN_PAR };
enum llm_norm_type { LLM_NORM, LLM_NORM_RMS, LLM_NORM_GROUP };

// Base input interface
class llm_graph_input_i {
    virtual void set_input(const llama_ubatch * ubatch) = 0;
    virtual bool can_reuse(const llm_graph_params & params);
};

// Input classes (15+ subclasses)
class llm_graph_input_embd : public llm_graph_input_i;
class llm_graph_input_pos : public llm_graph_input_i;
class llm_graph_input_attn_temp : public llm_graph_input_i;
class llm_graph_input_attn_kv : public llm_graph_input_i;
class llm_graph_input_attn_kv_iswa : public llm_graph_input_i;
class llm_graph_input_rs : public llm_graph_input_i;
class llm_graph_input_sampling : public llm_graph_input_i;

// Graph parameters and result
struct llm_graph_params;
class llm_graph_result;

// Cross-attention data
struct llama_cross;

Import

#pragma once
#include "llama-arch.h"
#include "llama-batch.h"
#include "llama-hparams.h"
#include "llama-adapter.h"
#include <cstdint>
#include <vector>
#include <memory>
#include <set>
#include <functional>
#include <map>

I/O Contract

Inputs

Name	Type	Required	Description
ubatch	const llama_ubatch *	Yes	Micro-batch data for setting graph input tensors
params	llm_graph_params	Yes	Complete parameter set including arch, hparams, cparams, memory context, and callbacks
cb	llm_graph_cb	Yes	Callback function for applying custom logic (e.g., offloading) to each tensor

Outputs

Name	Type	Description
result	llm_graph_result	Contains the built ggml compute graph, output tensors (logits, embeddings), and input references
gf	ggml_cgraph *	The compute graph ready for evaluation by the ggml backend

Usage Examples

// Create graph parameters
llm_graph_params params;
params.arch = LLM_ARCH_LLAMA;
params.hparams = model.hparams;
params.cparams = ctx.cparams;
params.ubatch = ubatch;
params.gtype = LLM_GRAPH_TYPE_DEFAULT;

// Check if existing graph can be reused
if (params.allow_reuse(old_params)) {
    // reuse existing graph
}

// Build graph with context (done inside model implementations)
// llm_graph_context gctx(params);
// gctx.build_norm(...);
// gctx.build_ffn(...);

Related Pages

Principle:Ggml_org_Llama_cpp_ComputeGraph

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment