Implementation:Ggml org Ggml Cpu vec api

Metadata

Field	Value
Page Type	Implementation (Vector API Header)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, CPU_Backend, SIMD
Last Updated	2025-05-15 12:00 GMT

Overview

Header declaring vectorized fundamental operations (dot products, element-wise arithmetic, GELU, softmax) with inline SIMD-optimized implementations for common vector functions.

Description

vec.h is the central vectorized math library for the CPU backend, providing the building blocks used by ops.cpp, binary-ops.cpp, and the matrix multiplication routines. It includes:

Extern function declarations: ggml_vec_dot_f32, ggml_vec_dot_bf16, ggml_vec_dot_f16, ggml_vec_silu_f32, ggml_vec_cvar_f32, ggml_vec_soft_max_f32, ggml_vec_log_soft_max_f32 (implemented in vec.cpp).
Inline element-wise operations: A large set of inline functions for common math operations across f32, f16, and bf16 types:
- Arithmetic: ggml_vec_add_f32 (AVX2-optimized), ggml_vec_sub_f32, ggml_vec_mul_f32, ggml_vec_div_f32, and their f16 variants.
- Assignment: ggml_vec_set_f32, ggml_vec_cpy_f32, ggml_vec_neg_f32.
- Scaling/MAD: ggml_vec_scale_f32, ggml_vec_scale_f16, ggml_vec_mad_f32 (multiply-accumulate with SIMD unrolling).
- Norms: ggml_vec_norm_f32 (L2 norm), ggml_vec_norm_inv_f32.
- Reductions: ggml_vec_sum_f32, ggml_vec_sum_f32_ggf, ggml_vec_sum_f16_ggf, ggml_vec_max_f32, ggml_vec_argmax_f32.
GELU functions: ggml_vec_gelu_f32 and ggml_vec_gelu_quick_f32 using precomputed f16 lookup tables for fast evaluation.
Unroll constants: GGML_SOFT_MAX_UNROLL (4), GGML_VEC_DOT_UNROLL (2), GGML_VEC_MAD_UNROLL (32).
Apple Accelerate integration: When GGML_USE_ACCELERATE is defined, several functions are redirected to Apple's vDSP routines.

Usage

Include vec.h in any CPU backend source file that needs access to vectorized math operations. Most functions are inline and will be compiled directly into the call site.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/vec.h (1585 lines).

Signature

// Dot products (extern, implemented in vec.cpp)
void ggml_vec_dot_f32(int n, float * s, size_t bs, const float * x, size_t bx,
    const float * y, size_t by, int nrc);

// Inline element-wise arithmetic
inline static void ggml_vec_add_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_sub_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_mul_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_div_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_scale_f32(const int n, float * y, const float v);
inline static void ggml_vec_mad_f32(const int n, float * y, const float * x, const float v);

// GELU activation via lookup table
inline static void ggml_vec_gelu_f32(const int n, float * y, const float * x);
inline static void ggml_vec_gelu_quick_f32(const int n, float * y, const float * x);

// Reductions
inline static void ggml_vec_sum_f32(const int n, float * s, const float * x);
inline static void ggml_vec_max_f32(const int n, float * s, const float * x);
inline static int  ggml_vec_argmax_f32(const int n, const float * x);

Import

#include "vec.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`n`	`int`	Yes	Number of elements in the input/output vectors.
`x`, `y`	`const float *`	Yes	Input vectors.
`v`	`float`	Conditional	Scalar value for scale/mad/set operations.

Outputs

Output	Type	Description
`z` or `y`	`float *`	Element-wise result vector.
`s`	`float *`	Scalar reduction result (sum, max, dot product).

Usage Examples

Element-wise Vector Operations

#include "vec.h"

float a[256], b[256], c[256];
// c = a + b
ggml_vec_add_f32(256, c, a, b);

// c = a * b
ggml_vec_mul_f32(256, c, a, b);

// a *= 0.5
ggml_vec_scale_f32(256, a, 0.5f);

// b += a * 2.0 (multiply-accumulate)
ggml_vec_mad_f32(256, b, a, 2.0f);

GELU Activation

#include "vec.h"

float input[1024], output[1024];
// Apply GELU using precomputed f16 table
ggml_vec_gelu_f32(1024, output, input);

Related Pages

Ggml_org_Ggml_Cpu_vec_ops -- Implementation file for the extern functions declared here.
Ggml_org_Ggml_Cpu_simd_mappings -- SIMD macros that these inline functions depend on.
Ggml_org_Ggml_Cpu_tensor_ops -- Tensor operations that use these vector primitives.
Ggml_org_Ggml_Cpu_unary_ops -- Unary operations that complement these vector functions.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment