Implementation:Ollama Ollama Llama Graph

Knowledge Sources	Ollama
Domains	Compute Graph, Inference
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the compute graph construction for LLM inference, building ggml computation graphs from model parameters, input data, and memory state.

Description

Implements the llm_graph_context class and various llm_graph_input_* classes that handle setting input tensors from batch data. Provides reusable graph building blocks: build_inp_* for input tensors (embeddings, positions, attention masks), build_norm for various normalization types (LayerNorm, RMSNorm, GroupNorm), build_ffn for feed-forward networks (SiLU, GELU, ReLU, SwiGLU, GeGLU variants), build_attn for attention with KV cache, build_moe_ffn for mixture-of-experts, and build_pooling for embedding pooling. Also handles cross-attention for encoder-decoder models and RoPE positional encoding. Supports graph reuse for performance optimization.

Usage

This is the central graph construction system that all model architectures use to build their computation graphs. It bridges the abstract model definition and the concrete tensor operations executed on GPU/CPU backends.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-graph.cpp
Lines: 1-2114

Signature

void llm_graph_input_embd::set_input(const llama_ubatch * ubatch);
bool llm_graph_input_embd::can_reuse(const llm_graph_params & params);

void llm_graph_input_pos::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_temp::set_input(const llama_ubatch * ubatch);
void llm_graph_input_out_ids::set_input(const llama_ubatch * ubatch);

void llm_graph_input_attn_kv::set_input(const llama_ubatch * ubatch);
bool llm_graph_input_attn_kv::can_reuse(const llm_graph_params & params);

// Graph result management
llm_graph_result::llm_graph_result(int64_t max_nodes);
void llm_graph_result::set_inputs(const llama_ubatch * ubatch);
bool llm_graph_result::can_reuse(const llm_graph_params & params);

Import

#include "llama-graph.h"

I/O Contract

Inputs

Name	Type	Required	Description
ubatch	const llama_ubatch *	Yes	Micro-batch with tokens, positions, and sequence info
params	llm_graph_params	Yes	Graph parameters including hparams, cparams, memory context

Outputs

Name	Type	Description
gf	ggml_cgraph *	Built computation graph ready for execution
t_logits	ggml_tensor *	Output logits tensor
t_embd	ggml_tensor *	Output embeddings tensor

Usage Examples

#include "llama-graph.h"

// Input nodes set their data from ubatch
llm_graph_input_embd inp_embd;
inp_embd.set_input(&ubatch);

// Check if graph can be reused
if (result->can_reuse(new_params)) {
    // Reuse existing graph - just update inputs
    result->set_inputs(&ubatch);
} else {
    // Build new graph
    // ... model-specific graph construction
}

// Access results
ggml_tensor * logits = result->get_logits();
ggml_cgraph * graph = result->get_gf();

Related Pages

Principle:Ollama_Ollama_Compute_Graph_System

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment