Implementation:Ggml org Llama cpp Eval Callback Example

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Debugging, Callbacks
Last Updated	2026-02-15 00:00 GMT

Overview

Demonstrates using a callback function during model inference to print all GGML operations and tensor data to the console.

Description

Sets up a `base_callback_data` object and registers `common_debug_cb_eval` as the eval callback via `params.cb_eval`. This callback is invoked by the backend scheduler for each graph node during computation. Tokenizes the prompt, runs `llama_decode`, and the callback prints operation details and tensor values as a side effect. The example provides a minimal template for hooking into the inference pipeline.

Usage

Use this example as a starting point for building custom inference introspection tools. It demonstrates how to register an eval callback that intercepts every GGML operation during a forward pass, useful for debugging tensor shapes, values, and computation graphs during model development and verification.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: examples/eval-callback/eval-callback.cpp
Lines: 1-80

Signature

static bool run(llama_context * ctx, const common_params & params);
int main(int argc, char ** argv);

Import

#include "arg.h"
#include "common.h"
#include "debug.h"
#include "log.h"
#include "llama.h"
#include "llama-cpp.h"

I/O Contract

Inputs

Name	Type	Required	Description
-m	string (CLI)	Yes	Path to the GGUF model file
-p	string (CLI)	Yes	Prompt text to evaluate (tokenized and decoded once)
standard CLI params	various	No	Common llama.cpp parameters (context size, GPU layers, etc.)

Outputs

Name	Type	Description
stdout/stderr	text	Printed GGML operation names, tensor shapes, and sampled values for each graph node
performance stats	text	Context performance metrics printed after inference

Usage Examples

# Run eval callback to inspect all tensor operations
./llama-eval-callback -m model.gguf -p "Hello my name is"

// Key pattern: registering an eval callback
base_callback_data cb_data;

common_params params;
params.cb_eval = common_debug_cb_eval<false>;
params.cb_eval_user_data = &cb_data;
params.warmup = false;

// The callback fires for each GGML graph node during llama_decode
auto llama_init = common_init_from_params(params);
auto * ctx = llama_init->context();

std::vector<llama_token> tokens = common_tokenize(ctx, params.prompt, add_bos);
llama_decode(ctx, llama_batch_get_one(tokens.data(), tokens.size()));
// Callback prints tensor info for each operation during decode

Related Pages

Principle:Ggml_org_Llama_cpp_Debugging

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment