Implementation:Ggml org Llama cpp Vdot Benchmark
| Knowledge Sources | |
|---|---|
| Domains | Quantization, Benchmarking |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Benchmarks quantized dot product accuracy and performance by comparing scalar float-to-quantized dot products against ggml's fully quantized vec_dot routines.
Description
Generates random Gaussian float vectors, quantizes x to Q4_0 or Q4_1 format, then computes dot products two ways: a scalar implementation that operates on quantized x against float y, and ggml's vec_dot that quantizes y to Q8_0 on-the-fly. Compares both results against the exact float dot product, measuring timing and accuracy for each approach over multiple iterations. Defines local copies of the Q4_0, Q4_1, and Q8_0 block structures for the scalar reference implementation.
Usage
Use this proof-of-concept when evaluating the accuracy-performance tradeoff of different quantized dot product strategies, validating that ggml's optimized kernels produce results close to the scalar reference implementation.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: pocs/vdot/vdot.cpp
- Lines: 1-311
Signature
constexpr int kVecSize = 1 << 18;
// Quantization block structures
struct block_q4_0 { float d; uint8_t qs[QK4_0 / 2]; };
struct block_q4_1 { float d; float m; uint8_t qs[QK4_1 / 2]; };
struct block_q8_0 { float d; int8_t qs[QK8_0]; };
// Helper functions
static float drawFromGaussianPdf(std::mt19937& rndm);
static void fillRandomGaussianFloats(std::vector<float>& values, std::mt19937& rndm, float mean = 0);
// Scalar dot product implementations
inline double dot(int n, const block_q4_0* x, const float* y);
inline double dot41(int n, const block_q4_1* x, const float* y);
int main(int argc, char ** argv);
Import
#include <cstdio>
#include <vector>
#include <random>
#include <chrono>
#include <cstdlib>
#include <cmath>
#include <cassert>
#include <cstring>
#include <array>
#include <ggml.h>
#include <ggml-cpu.h>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| kVecSize | int | No | Vector size for benchmark (default: 2^18 = 262144 elements) |
| nIter | int | No | Number of iterations for timing measurements |
| type | int | No | Quantization type to test (0 for Q4_0, 1 for Q4_1) |
Outputs
| Name | Type | Description |
|---|---|---|
| dot_reference | double | Exact float dot product value (ground truth) |
| dot_scalar | double | Scalar quantized-float dot product result with timing |
| dot_ggml | double | ggml vec_dot quantized-quantized dot product result with timing |
| error_metrics | stdout | Sum of differences and relative error between approaches |
Usage Examples
# Compile and run the vdot benchmark
cd pocs/vdot
make
./vdot
# Output shows:
# - Dot product values from reference float, scalar quantized, and ggml vec_dot
# - Timing comparisons between scalar and ggml approaches
# - Accuracy metrics (sum of differences vs ground truth)