Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Debug Example

From Leeroopedia
Knowledge Sources
Domains Debugging, Example
Last Updated 2026-02-15 00:00 GMT

Overview

Debug utility that logs GGML operations/tensor data and optionally saves logits or embeddings for model verification.

Description

Registers a `base_callback_data` callback with configurable tensor name filters. Runs a single decode pass on the provided prompt, then extracts either logits (for generative models) or embeddings (with pooling support and normalization). Saves outputs in multiple formats: binary float data, text data, prompt text with token IDs, and binary token IDs. The `output_data` struct handles both logit and embedding extraction based on the `--embedding` flag.

Usage

Use this tool for verifying model conversion accuracy by comparing llama.cpp outputs against reference implementations tensor-by-tensor and logit-by-logit. Run with `--save-logits` to save logit/embedding data, or with `--verbose` and `--tensor-filter` to print specific tensor operations during inference.

Code Reference

Source Location

Signature

static void print_usage(int argc, char ** argv);
static bool has_pooling(llama_context * ctx);

struct output_data {
    float *                  data_ptr;
    int                      data_size;
    std::string              type_suffix;
    std::vector<float>       embd_norm;
    std::string              prompt;
    std::vector<llama_token> tokens;

    output_data(llama_context * ctx, const llama_model * model, const common_params & params);
};

static void save_output_data(const output_data & output, const std::string & model_name, const std::string & output_dir);
static void print_tokenized_prompt(llama_context * ctx, const std::vector<llama_token> & tokens, const std::string & prompt);
static bool run(llama_context * ctx, const common_params & params);
int main(int argc, char ** argv);

Import

#include "debug.h"
#include "arg.h"
#include "common.h"
#include "log.h"
#include "llama.h"

I/O Contract

Inputs

Name Type Required Description
-m string (CLI) Yes Path to the GGUF model file
-p string (CLI) Yes Prompt text to evaluate
--save-logits flag (CLI) No Enable saving logits/embeddings to output files
--embedding flag (CLI) No Extract embeddings instead of logits
--tensor-filter string (CLI) No Filter tensor names for debug printing
--verbose flag (CLI) No Enable verbose tensor printing during evaluation
--embd-normalize int (CLI) No Normalization mode for embeddings (>= 0 to enable)

Outputs

Name Type Description
.bin file binary Raw float data (logits or embeddings) in binary format
.txt file text Float data with index labels in text format
-prompt.txt file text Prompt text and token IDs in human-readable format
-tokens.bin file binary Token IDs in binary format
stdout text Tensor operation details (when verbose) and performance metrics

Usage Examples

# Print all tensor operations during inference
./llama-debug -m model.gguf -p "Hello my name is" --verbose

# Print only specific tensors
./llama-debug -m model.gguf -p "Hello my name is" --verbose --tensor-filter "attn"

# Save logits for comparison
./llama-debug -m model.gguf -p "Hello my name is" --save-logits

# Save embeddings with normalization
./llama-debug -m model.gguf -p "Hello my name is" --save-logits --embedding --embd-normalize 2

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment