Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Vdot Benchmark

From Leeroopedia
Revision as of 12:42, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Vdot_Benchmark.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Quantization, Benchmarking
Last Updated 2026-02-15 00:00 GMT

Overview

Benchmarks quantized dot product accuracy and performance by comparing scalar float-to-quantized dot products against ggml's fully quantized vec_dot routines.

Description

Generates random Gaussian float vectors, quantizes x to Q4_0 or Q4_1 format, then computes dot products two ways: a scalar implementation that operates on quantized x against float y, and ggml's vec_dot that quantizes y to Q8_0 on-the-fly. Compares both results against the exact float dot product, measuring timing and accuracy for each approach over multiple iterations. Defines local copies of the Q4_0, Q4_1, and Q8_0 block structures for the scalar reference implementation.

Usage

Use this proof-of-concept when evaluating the accuracy-performance tradeoff of different quantized dot product strategies, validating that ggml's optimized kernels produce results close to the scalar reference implementation.

Code Reference

Source Location

Signature

constexpr int kVecSize = 1 << 18;

// Quantization block structures
struct block_q4_0 { float d; uint8_t qs[QK4_0 / 2]; };
struct block_q4_1 { float d; float m; uint8_t qs[QK4_1 / 2]; };
struct block_q8_0 { float d; int8_t qs[QK8_0]; };

// Helper functions
static float drawFromGaussianPdf(std::mt19937& rndm);
static void fillRandomGaussianFloats(std::vector<float>& values, std::mt19937& rndm, float mean = 0);

// Scalar dot product implementations
inline double dot(int n, const block_q4_0* x, const float* y);
inline double dot41(int n, const block_q4_1* x, const float* y);

int main(int argc, char ** argv);

Import

#include <cstdio>
#include <vector>
#include <random>
#include <chrono>
#include <cstdlib>
#include <cmath>
#include <cassert>
#include <cstring>
#include <array>
#include <ggml.h>
#include <ggml-cpu.h>

I/O Contract

Inputs

Name Type Required Description
kVecSize int No Vector size for benchmark (default: 2^18 = 262144 elements)
nIter int No Number of iterations for timing measurements
type int No Quantization type to test (0 for Q4_0, 1 for Q4_1)

Outputs

Name Type Description
dot_reference double Exact float dot product value (ground truth)
dot_scalar double Scalar quantized-float dot product result with timing
dot_ggml double ggml vec_dot quantized-quantized dot product result with timing
error_metrics stdout Sum of differences and relative error between approaches

Usage Examples

# Compile and run the vdot benchmark
cd pocs/vdot
make
./vdot

# Output shows:
# - Dot product values from reference float, scalar quantized, and ggml vec_dot
# - Timing comparisons between scalar and ggml approaches
# - Accuracy metrics (sum of differences vs ground truth)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment