Implementation:Ggml org Llama cpp Lookup Stats
| Knowledge Sources | |
|---|---|
| Domains | Speculative_Decoding, Analysis |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Evaluates the effectiveness of prompt lookup decoding by computing acceptance statistics for drafted tokens against the actual model output.
Description
Loads static and dynamic n-gram caches, tokenizes the input, then processes the tokens in context-sized chunks simulating sequential generation. For each position, it drafts candidate tokens using n-gram lookup and compares them against the known-correct next tokens, tracking the number of drafted and accepted tokens. Reports overall acceptance rates and timing statistics for the draft process.
Usage
Use this diagnostic tool to evaluate how well prompt lookup decoding would perform for a specific text and model combination, guiding parameter tuning and cache construction strategies.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/lookup/lookup-stats.cpp
- Lines: 1-157
Signature
int main(int argc, char ** argv);
Import
#include "arg.h"
#include "common.h"
#include "log.h"
#include "ngram-cache.h"
#include "llama.h"
#include "ggml.h"
#include <cstdint>
#include <cstdio>
#include <cinttypes>
#include <fstream>
#include <string>
#include <vector>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| -m | string | Yes | Path to the GGUF model file |
| -p | string | Yes | Prompt text to evaluate lookup decoding against |
| --lookup-cache-static | string | No | Path to a static n-gram cache file |
| --lookup-cache-dynamic | string | No | Path to a dynamic n-gram cache file |
| -n | int | No | Maximum number of draft tokens per step |
Outputs
| Name | Type | Description |
|---|---|---|
| stdout | text | Acceptance rate statistics: number of drafted tokens, accepted tokens, and timing |
| return | int | Exit code: 0 on success, 1 on failure |
Usage Examples
# Evaluate lookup stats with static cache
./build/bin/llama-lookup-stats \
-m model.gguf \
-p "Your input text here..." \
--lookup-cache-static static_cache.bin