Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Eval Callback Example

From Leeroopedia
Knowledge Sources
Domains Debugging, Callbacks
Last Updated 2026-02-15 00:00 GMT

Overview

Demonstrates using a callback function during model inference to print all GGML operations and tensor data to the console.

Description

Sets up a `base_callback_data` object and registers `common_debug_cb_eval` as the eval callback via `params.cb_eval`. This callback is invoked by the backend scheduler for each graph node during computation. Tokenizes the prompt, runs `llama_decode`, and the callback prints operation details and tensor values as a side effect. The example provides a minimal template for hooking into the inference pipeline.

Usage

Use this example as a starting point for building custom inference introspection tools. It demonstrates how to register an eval callback that intercepts every GGML operation during a forward pass, useful for debugging tensor shapes, values, and computation graphs during model development and verification.

Code Reference

Source Location

Signature

static bool run(llama_context * ctx, const common_params & params);
int main(int argc, char ** argv);

Import

#include "arg.h"
#include "common.h"
#include "debug.h"
#include "log.h"
#include "llama.h"
#include "llama-cpp.h"

I/O Contract

Inputs

Name Type Required Description
-m string (CLI) Yes Path to the GGUF model file
-p string (CLI) Yes Prompt text to evaluate (tokenized and decoded once)
standard CLI params various No Common llama.cpp parameters (context size, GPU layers, etc.)

Outputs

Name Type Description
stdout/stderr text Printed GGML operation names, tensor shapes, and sampled values for each graph node
performance stats text Context performance metrics printed after inference

Usage Examples

# Run eval callback to inspect all tensor operations
./llama-eval-callback -m model.gguf -p "Hello my name is"
// Key pattern: registering an eval callback
base_callback_data cb_data;

common_params params;
params.cb_eval = common_debug_cb_eval<false>;
params.cb_eval_user_data = &cb_data;
params.warmup = false;

// The callback fires for each GGML graph node during llama_decode
auto llama_init = common_init_from_params(params);
auto * ctx = llama_init->context();

std::vector<llama_token> tokens = common_tokenize(ctx, params.prompt, add_bos);
llama_decode(ctx, llama_batch_get_one(tokens.data(), tokens.size()));
// Callback prints tensor info for each operation during decode

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment