Implementation:Ggml org Llama cpp Vdot Benchmark

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Quantization, Benchmarking
Last Updated	2026-02-15 00:00 GMT

Overview

Benchmarks quantized dot product accuracy and performance by comparing scalar float-to-quantized dot products against ggml's fully quantized vec_dot routines.

Description

Generates random Gaussian float vectors, quantizes x to Q4_0 or Q4_1 format, then computes dot products two ways: a scalar implementation that operates on quantized x against float y, and ggml's vec_dot that quantizes y to Q8_0 on-the-fly. Compares both results against the exact float dot product, measuring timing and accuracy for each approach over multiple iterations. Defines local copies of the Q4_0, Q4_1, and Q8_0 block structures for the scalar reference implementation.

Usage

Use this proof-of-concept when evaluating the accuracy-performance tradeoff of different quantized dot product strategies, validating that ggml's optimized kernels produce results close to the scalar reference implementation.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: pocs/vdot/vdot.cpp
Lines: 1-311

Signature

constexpr int kVecSize = 1 << 18;

// Quantization block structures
struct block_q4_0 { float d; uint8_t qs[QK4_0 / 2]; };
struct block_q4_1 { float d; float m; uint8_t qs[QK4_1 / 2]; };
struct block_q8_0 { float d; int8_t qs[QK8_0]; };

// Helper functions
static float drawFromGaussianPdf(std::mt19937& rndm);
static void fillRandomGaussianFloats(std::vector<float>& values, std::mt19937& rndm, float mean = 0);

// Scalar dot product implementations
inline double dot(int n, const block_q4_0* x, const float* y);
inline double dot41(int n, const block_q4_1* x, const float* y);

int main(int argc, char ** argv);

Import

#include <cstdio>
#include <vector>
#include <random>
#include <chrono>
#include <cstdlib>
#include <cmath>
#include <cassert>
#include <cstring>
#include <array>
#include <ggml.h>
#include <ggml-cpu.h>

I/O Contract

Inputs

Name	Type	Required	Description
kVecSize	int	No	Vector size for benchmark (default: 2^18 = 262144 elements)
nIter	int	No	Number of iterations for timing measurements
type	int	No	Quantization type to test (0 for Q4_0, 1 for Q4_1)

Outputs

Name	Type	Description
dot_reference	double	Exact float dot product value (ground truth)
dot_scalar	double	Scalar quantized-float dot product result with timing
dot_ggml	double	ggml vec_dot quantized-quantized dot product result with timing
error_metrics	stdout	Sum of differences and relative error between approaches

Usage Examples

# Compile and run the vdot benchmark
cd pocs/vdot
make
./vdot

# Output shows:
# - Dot product values from reference float, scalar quantized, and ggml vec_dot
# - Timing comparisons between scalar and ggml approaches
# - Accuracy metrics (sum of differences vs ground truth)

Related Pages

Principle:Ggml_org_Llama_cpp_Quantization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment