Implementation:Ggml org Llama cpp Debug Example

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Debugging, Example
Last Updated	2026-02-15 00:00 GMT

Overview

Debug utility that logs GGML operations/tensor data and optionally saves logits or embeddings for model verification.

Description

Registers a `base_callback_data` callback with configurable tensor name filters. Runs a single decode pass on the provided prompt, then extracts either logits (for generative models) or embeddings (with pooling support and normalization). Saves outputs in multiple formats: binary float data, text data, prompt text with token IDs, and binary token IDs. The `output_data` struct handles both logit and embedding extraction based on the `--embedding` flag.

Usage

Use this tool for verifying model conversion accuracy by comparing llama.cpp outputs against reference implementations tensor-by-tensor and logit-by-logit. Run with `--save-logits` to save logit/embedding data, or with `--verbose` and `--tensor-filter` to print specific tensor operations during inference.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: examples/debug/debug.cpp
Lines: 1-253

Signature

static void print_usage(int argc, char ** argv);
static bool has_pooling(llama_context * ctx);

struct output_data {
    float *                  data_ptr;
    int                      data_size;
    std::string              type_suffix;
    std::vector<float>       embd_norm;
    std::string              prompt;
    std::vector<llama_token> tokens;

    output_data(llama_context * ctx, const llama_model * model, const common_params & params);
};

static void save_output_data(const output_data & output, const std::string & model_name, const std::string & output_dir);
static void print_tokenized_prompt(llama_context * ctx, const std::vector<llama_token> & tokens, const std::string & prompt);
static bool run(llama_context * ctx, const common_params & params);
int main(int argc, char ** argv);

Import

#include "debug.h"
#include "arg.h"
#include "common.h"
#include "log.h"
#include "llama.h"

I/O Contract

Inputs

Name	Type	Required	Description
-m	string (CLI)	Yes	Path to the GGUF model file
-p	string (CLI)	Yes	Prompt text to evaluate
--save-logits	flag (CLI)	No	Enable saving logits/embeddings to output files
--embedding	flag (CLI)	No	Extract embeddings instead of logits
--tensor-filter	string (CLI)	No	Filter tensor names for debug printing
--verbose	flag (CLI)	No	Enable verbose tensor printing during evaluation
--embd-normalize	int (CLI)	No	Normalization mode for embeddings (>= 0 to enable)

Outputs

Name	Type	Description
.bin file	binary	Raw float data (logits or embeddings) in binary format
.txt file	text	Float data with index labels in text format
-prompt.txt file	text	Prompt text and token IDs in human-readable format
-tokens.bin file	binary	Token IDs in binary format
stdout	text	Tensor operation details (when verbose) and performance metrics

Usage Examples

# Print all tensor operations during inference
./llama-debug -m model.gguf -p "Hello my name is" --verbose

# Print only specific tensors
./llama-debug -m model.gguf -p "Hello my name is" --verbose --tensor-filter "attn"

# Save logits for comparison
./llama-debug -m model.gguf -p "Hello my name is" --save-logits

# Save embeddings with normalization
./llama-debug -m model.gguf -p "Hello my name is" --save-logits --embedding --embd-normalize 2

Related Pages

Principle:Ggml_org_Llama_cpp_Debugging

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment