Implementation:Ggml org Llama cpp Debug Example
| Knowledge Sources | |
|---|---|
| Domains | Debugging, Example |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Debug utility that logs GGML operations/tensor data and optionally saves logits or embeddings for model verification.
Description
Registers a `base_callback_data` callback with configurable tensor name filters. Runs a single decode pass on the provided prompt, then extracts either logits (for generative models) or embeddings (with pooling support and normalization). Saves outputs in multiple formats: binary float data, text data, prompt text with token IDs, and binary token IDs. The `output_data` struct handles both logit and embedding extraction based on the `--embedding` flag.
Usage
Use this tool for verifying model conversion accuracy by comparing llama.cpp outputs against reference implementations tensor-by-tensor and logit-by-logit. Run with `--save-logits` to save logit/embedding data, or with `--verbose` and `--tensor-filter` to print specific tensor operations during inference.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/debug/debug.cpp
- Lines: 1-253
Signature
static void print_usage(int argc, char ** argv);
static bool has_pooling(llama_context * ctx);
struct output_data {
float * data_ptr;
int data_size;
std::string type_suffix;
std::vector<float> embd_norm;
std::string prompt;
std::vector<llama_token> tokens;
output_data(llama_context * ctx, const llama_model * model, const common_params & params);
};
static void save_output_data(const output_data & output, const std::string & model_name, const std::string & output_dir);
static void print_tokenized_prompt(llama_context * ctx, const std::vector<llama_token> & tokens, const std::string & prompt);
static bool run(llama_context * ctx, const common_params & params);
int main(int argc, char ** argv);
Import
#include "debug.h"
#include "arg.h"
#include "common.h"
#include "log.h"
#include "llama.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| -m | string (CLI) | Yes | Path to the GGUF model file |
| -p | string (CLI) | Yes | Prompt text to evaluate |
| --save-logits | flag (CLI) | No | Enable saving logits/embeddings to output files |
| --embedding | flag (CLI) | No | Extract embeddings instead of logits |
| --tensor-filter | string (CLI) | No | Filter tensor names for debug printing |
| --verbose | flag (CLI) | No | Enable verbose tensor printing during evaluation |
| --embd-normalize | int (CLI) | No | Normalization mode for embeddings (>= 0 to enable) |
Outputs
| Name | Type | Description |
|---|---|---|
| .bin file | binary | Raw float data (logits or embeddings) in binary format |
| .txt file | text | Float data with index labels in text format |
| -prompt.txt file | text | Prompt text and token IDs in human-readable format |
| -tokens.bin file | binary | Token IDs in binary format |
| stdout | text | Tensor operation details (when verbose) and performance metrics |
Usage Examples
# Print all tensor operations during inference
./llama-debug -m model.gguf -p "Hello my name is" --verbose
# Print only specific tensors
./llama-debug -m model.gguf -p "Hello my name is" --verbose --tensor-filter "attn"
# Save logits for comparison
./llama-debug -m model.gguf -p "Hello my name is" --save-logits
# Save embeddings with normalization
./llama-debug -m model.gguf -p "Hello my name is" --save-logits --embedding --embd-normalize 2