Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Llama Bench

From Leeroopedia
Knowledge Sources
Domains Benchmarking, Performance
Last Updated 2026-02-15 00:00 GMT

Overview

Full-featured performance benchmarking tool for llama.cpp that measures prompt processing and text generation speeds across configurable parameter combinations.

Description

Llama Bench parses command-line parameters into a `cmd_params` struct supporting multi-value specifications (comma-separated, ranges), then generates all combinations as `cmd_params_instance` objects. For each instance, it loads the model, creates a context, runs warm-up, then executes prompt processing and/or text generation tests with multiple repetitions, collecting timing data into `test` objects. Results are output through a polymorphic `printer` hierarchy supporting Markdown, CSV, JSON, JSONL, and SQL formats.

Usage

Use this tool for performance regression testing, hardware comparisons, optimization validation, and benchmarking inference throughput across different model configurations and backend devices.

Code Reference

Source Location

Signature

// Main entry point
int main(int argc, char ** argv);

// Core structures
struct cmd_params { /* multi-value CLI parameter sets */ };
struct cmd_params_instance { /* single parameter combination */ };
struct test { /* benchmark result with timing data */ };

// Output printers
struct printer { /* base polymorphic printer */ };
struct csv_printer : public printer { /* CSV output */ };
struct json_printer : public printer { /* JSON output */ };
struct jsonl_printer : public printer { /* JSON Lines output */ };
struct markdown_printer : public printer { /* Markdown table output */ };
struct sql_printer : public printer { /* SQL INSERT output */ };

Import

#include "common.h"
#include "ggml.h"
#include "llama.h"

I/O Contract

Inputs

Name Type Required Description
-m, --model string Yes Path to the GGUF model file to benchmark
-p, --n-prompt int list No Number of prompt tokens (comma-separated or range, default: 512)
-n, --n-gen int list No Number of tokens to generate (comma-separated or range, default: 128)
-b, --batch-size int list No Batch sizes to test
-t, --threads int list No Number of threads to use
-ngl, --n-gpu-layers int list No Number of layers to offload to GPU
-r, --repetitions int No Number of test repetitions (default: 5)
-o, --output string No Output format: md, csv, json, jsonl, sql (default: md)

Outputs

Name Type Description
benchmark results stdout Formatted benchmark data including tokens/second for prompt processing and generation
return code int 0 on success, non-zero on failure

Usage Examples

# Basic benchmark with default settings
./llama-bench -m model.gguf

# Test multiple prompt sizes and generation lengths
./llama-bench -m model.gguf -p 128,256,512 -n 64,128 -o csv

# Benchmark with GPU offloading and multiple thread counts
./llama-bench -m model.gguf -ngl 99 -t 4,8,16 -r 3

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment