Implementation:Ggml org Llama cpp Eval Callback Example
| Knowledge Sources | |
|---|---|
| Domains | Debugging, Callbacks |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Demonstrates using a callback function during model inference to print all GGML operations and tensor data to the console.
Description
Sets up a `base_callback_data` object and registers `common_debug_cb_eval` as the eval callback via `params.cb_eval`. This callback is invoked by the backend scheduler for each graph node during computation. Tokenizes the prompt, runs `llama_decode`, and the callback prints operation details and tensor values as a side effect. The example provides a minimal template for hooking into the inference pipeline.
Usage
Use this example as a starting point for building custom inference introspection tools. It demonstrates how to register an eval callback that intercepts every GGML operation during a forward pass, useful for debugging tensor shapes, values, and computation graphs during model development and verification.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/eval-callback/eval-callback.cpp
- Lines: 1-80
Signature
static bool run(llama_context * ctx, const common_params & params);
int main(int argc, char ** argv);
Import
#include "arg.h"
#include "common.h"
#include "debug.h"
#include "log.h"
#include "llama.h"
#include "llama-cpp.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| -m | string (CLI) | Yes | Path to the GGUF model file |
| -p | string (CLI) | Yes | Prompt text to evaluate (tokenized and decoded once) |
| standard CLI params | various | No | Common llama.cpp parameters (context size, GPU layers, etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
| stdout/stderr | text | Printed GGML operation names, tensor shapes, and sampled values for each graph node |
| performance stats | text | Context performance metrics printed after inference |
Usage Examples
# Run eval callback to inspect all tensor operations
./llama-eval-callback -m model.gguf -p "Hello my name is"
// Key pattern: registering an eval callback
base_callback_data cb_data;
common_params params;
params.cb_eval = common_debug_cb_eval<false>;
params.cb_eval_user_data = &cb_data;
params.warmup = false;
// The callback fires for each GGML graph node during llama_decode
auto llama_init = common_init_from_params(params);
auto * ctx = llama_init->context();
std::vector<llama_token> tokens = common_tokenize(ctx, params.prompt, add_bos);
llama_decode(ctx, llama_batch_get_one(tokens.data(), tokens.size()));
// Callback prints tensor info for each operation during decode