Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Graph

From Leeroopedia
Knowledge Sources
Domains Compute Graph, Inference
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the compute graph construction for LLM inference, building ggml computation graphs from model parameters, input data, and memory state.

Description

Implements the llm_graph_context class and various llm_graph_input_* classes that handle setting input tensors from batch data. Provides reusable graph building blocks: build_inp_* for input tensors (embeddings, positions, attention masks), build_norm for various normalization types (LayerNorm, RMSNorm, GroupNorm), build_ffn for feed-forward networks (SiLU, GELU, ReLU, SwiGLU, GeGLU variants), build_attn for attention with KV cache, build_moe_ffn for mixture-of-experts, and build_pooling for embedding pooling. Also handles cross-attention for encoder-decoder models and RoPE positional encoding. Supports graph reuse for performance optimization.

Usage

This is the central graph construction system that all model architectures use to build their computation graphs. It bridges the abstract model definition and the concrete tensor operations executed on GPU/CPU backends.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-graph.cpp
  • Lines: 1-2114

Signature

void llm_graph_input_embd::set_input(const llama_ubatch * ubatch);
bool llm_graph_input_embd::can_reuse(const llm_graph_params & params);

void llm_graph_input_pos::set_input(const llama_ubatch * ubatch);
void llm_graph_input_attn_temp::set_input(const llama_ubatch * ubatch);
void llm_graph_input_out_ids::set_input(const llama_ubatch * ubatch);

void llm_graph_input_attn_kv::set_input(const llama_ubatch * ubatch);
bool llm_graph_input_attn_kv::can_reuse(const llm_graph_params & params);

// Graph result management
llm_graph_result::llm_graph_result(int64_t max_nodes);
void llm_graph_result::set_inputs(const llama_ubatch * ubatch);
bool llm_graph_result::can_reuse(const llm_graph_params & params);

Import

#include "llama-graph.h"

I/O Contract

Inputs

Name Type Required Description
ubatch const llama_ubatch * Yes Micro-batch with tokens, positions, and sequence info
params llm_graph_params Yes Graph parameters including hparams, cparams, memory context

Outputs

Name Type Description
gf ggml_cgraph * Built computation graph ready for execution
t_logits ggml_tensor * Output logits tensor
t_embd ggml_tensor * Output embeddings tensor

Usage Examples

#include "llama-graph.h"

// Input nodes set their data from ubatch
llm_graph_input_embd inp_embd;
inp_embd.set_input(&ubatch);

// Check if graph can be reused
if (result->can_reuse(new_params)) {
    // Reuse existing graph - just update inputs
    result->set_inputs(&ubatch);
} else {
    // Build new graph
    // ... model-specific graph construction
}

// Access results
ggml_tensor * logits = result->get_logits();
ggml_cgraph * graph = result->get_gf();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment