Implementation:Ollama Ollama Llama Graph
| Knowledge Sources | |
|---|---|
| Domains | Compute Graph, Inference |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the compute graph construction for LLM inference, building ggml computation graphs from model parameters, input data, and memory state.
Description
Implements the llm_graph_context class and various llm_graph_input_* classes that handle setting input tensors from batch data. Provides reusable graph building blocks: build_inp_* for input tensors (embeddings, positions, attention masks), build_norm for various normalization types (LayerNorm, RMSNorm, GroupNorm), build_ffn for feed-forward networks (SiLU, GELU, ReLU, SwiGLU, GeGLU variants), build_attn for attention with KV cache, build_moe_ffn for mixture-of-experts, and build_pooling for embedding pooling. Also handles cross-attention for encoder-decoder models and RoPE positional encoding. Supports graph reuse for performance optimization.
Usage
This is the central graph construction system that all model architectures use to build their computation graphs. It bridges the abstract model definition and the concrete tensor operations executed on GPU/CPU backends.
Code Reference
Source Location
- Repository: Ollama
- File: llama/llama.cpp/src/llama-graph.cpp
- Lines: 1-2114
Signature
void llm_graph_input_embd::set_input(const llama_ubatch * ubatch);
bool llm_graph_input_embd::can_reuse(const llm_graph_params & params);
void llm_graph_input_pos::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_temp::set_input(const llama_ubatch * ubatch);
void llm_graph_input_out_ids::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_kv::set_input(const llama_ubatch * ubatch);
bool llm_graph_input_attn_kv::can_reuse(const llm_graph_params & params);
// Graph result management
llm_graph_result::llm_graph_result(int64_t max_nodes);
void llm_graph_result::set_inputs(const llama_ubatch * ubatch);
bool llm_graph_result::can_reuse(const llm_graph_params & params);
Import
#include "llama-graph.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ubatch | const llama_ubatch * | Yes | Micro-batch with tokens, positions, and sequence info |
| params | llm_graph_params | Yes | Graph parameters including hparams, cparams, memory context |
Outputs
| Name | Type | Description |
|---|---|---|
| gf | ggml_cgraph * | Built computation graph ready for execution |
| t_logits | ggml_tensor * | Output logits tensor |
| t_embd | ggml_tensor * | Output embeddings tensor |
Usage Examples
#include "llama-graph.h"
// Input nodes set their data from ubatch
llm_graph_input_embd inp_embd;
inp_embd.set_input(&ubatch);
// Check if graph can be reused
if (result->can_reuse(new_params)) {
// Reuse existing graph - just update inputs
result->set_inputs(&ubatch);
} else {
// Build new graph
// ... model-specific graph construction
}
// Access results
ggml_tensor * logits = result->get_logits();
ggml_cgraph * graph = result->get_gf();