Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu vec api

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Vector API Header)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, CPU_Backend, SIMD
Last Updated 2025-05-15 12:00 GMT

Overview

Header declaring vectorized fundamental operations (dot products, element-wise arithmetic, GELU, softmax) with inline SIMD-optimized implementations for common vector functions.

Description

vec.h is the central vectorized math library for the CPU backend, providing the building blocks used by ops.cpp, binary-ops.cpp, and the matrix multiplication routines. It includes:

  1. Extern function declarations: ggml_vec_dot_f32, ggml_vec_dot_bf16, ggml_vec_dot_f16, ggml_vec_silu_f32, ggml_vec_cvar_f32, ggml_vec_soft_max_f32, ggml_vec_log_soft_max_f32 (implemented in vec.cpp).
  2. Inline element-wise operations: A large set of inline functions for common math operations across f32, f16, and bf16 types:
    • Arithmetic: ggml_vec_add_f32 (AVX2-optimized), ggml_vec_sub_f32, ggml_vec_mul_f32, ggml_vec_div_f32, and their f16 variants.
    • Assignment: ggml_vec_set_f32, ggml_vec_cpy_f32, ggml_vec_neg_f32.
    • Scaling/MAD: ggml_vec_scale_f32, ggml_vec_scale_f16, ggml_vec_mad_f32 (multiply-accumulate with SIMD unrolling).
    • Norms: ggml_vec_norm_f32 (L2 norm), ggml_vec_norm_inv_f32.
    • Reductions: ggml_vec_sum_f32, ggml_vec_sum_f32_ggf, ggml_vec_sum_f16_ggf, ggml_vec_max_f32, ggml_vec_argmax_f32.
  3. GELU functions: ggml_vec_gelu_f32 and ggml_vec_gelu_quick_f32 using precomputed f16 lookup tables for fast evaluation.
  4. Unroll constants: GGML_SOFT_MAX_UNROLL (4), GGML_VEC_DOT_UNROLL (2), GGML_VEC_MAD_UNROLL (32).
  5. Apple Accelerate integration: When GGML_USE_ACCELERATE is defined, several functions are redirected to Apple's vDSP routines.

Usage

Include vec.h in any CPU backend source file that needs access to vectorized math operations. Most functions are inline and will be compiled directly into the call site.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/vec.h (1585 lines).

Signature

// Dot products (extern, implemented in vec.cpp)
void ggml_vec_dot_f32(int n, float * s, size_t bs, const float * x, size_t bx,
    const float * y, size_t by, int nrc);

// Inline element-wise arithmetic
inline static void ggml_vec_add_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_sub_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_mul_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_div_f32(const int n, float * z, const float * x, const float * y);
inline static void ggml_vec_scale_f32(const int n, float * y, const float v);
inline static void ggml_vec_mad_f32(const int n, float * y, const float * x, const float v);

// GELU activation via lookup table
inline static void ggml_vec_gelu_f32(const int n, float * y, const float * x);
inline static void ggml_vec_gelu_quick_f32(const int n, float * y, const float * x);

// Reductions
inline static void ggml_vec_sum_f32(const int n, float * s, const float * x);
inline static void ggml_vec_max_f32(const int n, float * s, const float * x);
inline static int  ggml_vec_argmax_f32(const int n, const float * x);

Import

#include "vec.h"

I/O Contract

Inputs

Parameter Type Required Description
n int Yes Number of elements in the input/output vectors.
x, y const float * Yes Input vectors.
v float Conditional Scalar value for scale/mad/set operations.

Outputs

Output Type Description
z or y float * Element-wise result vector.
s float * Scalar reduction result (sum, max, dot product).

Usage Examples

Element-wise Vector Operations

#include "vec.h"

float a[256], b[256], c[256];
// c = a + b
ggml_vec_add_f32(256, c, a, b);

// c = a * b
ggml_vec_mul_f32(256, c, a, b);

// a *= 0.5
ggml_vec_scale_f32(256, a, 0.5f);

// b += a * 2.0 (multiply-accumulate)
ggml_vec_mad_f32(256, b, a, 2.0f);

GELU Activation

#include "vec.h"

float input[1024], output[1024];
// Apply GELU using precomputed f16 table
ggml_vec_gelu_f32(1024, output, input);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment