Implementation:Ggml org Ggml Cpu vec ops
Appearance
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Vectorized Operations) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, CPU_Backend, SIMD |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Implements SIMD-optimized vector dot products, SiLU activation, softmax, variance, and precomputed GELU lookup tables for the CPU backend.
Description
vec.cpp provides the performance-critical vector operations used heavily in matrix multiplication, attention computation, and activation functions. Key components include:
- Precomputed lookup tables:
ggml_table_gelu_f16(128 KB) andggml_table_gelu_quick_f16(128 KB) for fast GELU/quick-GELU evaluation via f16 table lookup. - Vector dot products: Architecture-specific SIMD implementations of:
ggml_vec_dot_f32-- f32 dot product with ARM SVE (8-way unrolled FMA with predicated tail), RISC-V vector intrinsics (variable-length vectorization), and generic SIMD viaGGML_F32_VECmacros.ggml_vec_dot_bf16-- bf16 dot product with AVX-512 BF16 (_mm512_dpbf16_ps), AVX2 with bf16-to-f32 shift, NEON, and scalar fallbacks.ggml_vec_dot_f16-- f16 dot product with architecture-specific paths.
- Activation functions:
ggml_vec_silu_f32computes SiLU (Sigmoid Linear Unit) element-wise. - Statistical operations:
ggml_vec_cvar_f32computes centered variance (also centers the output vector),ggml_vec_soft_max_f32computes numerically stable softmax, andggml_vec_log_soft_max_f32computes log-softmax.
Usage
These functions are called by ops.cpp tensor operations and by matrix multiplication routines. They are not typically called directly by user code.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/vec.cpp (630 lines).
Signature
void ggml_vec_dot_f32(int n, float * GGML_RESTRICT s, size_t bs,
const float * GGML_RESTRICT x, size_t bx,
const float * GGML_RESTRICT y, size_t by, int nrc);
void ggml_vec_dot_bf16(int n, float * GGML_RESTRICT s, size_t bs,
ggml_bf16_t * GGML_RESTRICT x, size_t bx,
ggml_bf16_t * GGML_RESTRICT y, size_t by, int nrc);
void ggml_vec_dot_f16(int n, float * GGML_RESTRICT s, size_t bs,
ggml_fp16_t * GGML_RESTRICT x, size_t bx,
ggml_fp16_t * GGML_RESTRICT y, size_t by, int nrc);
void ggml_vec_silu_f32(const int n, float * y, const float * x);
ggml_float ggml_vec_cvar_f32(const int n, float * y, const float * x, const float mean);
ggml_float ggml_vec_soft_max_f32(const int n, float * y, const float * x, float max);
ggml_float ggml_vec_log_soft_max_f32(const int n, float * y, const float * x, float max);
Import
#include "vec.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
n |
int |
Yes | Number of elements in the input vectors. |
x, y |
const float * (or f16/bf16) |
Yes | Input vectors for dot product or element-wise operation. |
nrc |
int |
Yes (dot) | Number of rows to compute (typically 1). |
max |
float |
Yes (softmax) | Maximum value for numerical stability in softmax computation. |
Outputs
| Output | Type | Description |
|---|---|---|
s |
float * |
Scalar dot product result. |
y |
float * |
Element-wise output vector (SiLU, softmax, variance). |
| Return value | ggml_float |
Sum for softmax, variance for cvar. |
Usage Examples
Computing a Float Dot Product
#include "vec.h"
float x[256] = { /* ... */ };
float y[256] = { /* ... */ };
float result;
ggml_vec_dot_f32(256, &result, 0, x, 0, y, 0, 1);
// result now contains the dot product of x and y
Computing Softmax
#include "vec.h"
float logits[1024] = { /* ... */ };
float probs[1024];
float max_val = /* compute max of logits */;
ggml_float sum = ggml_vec_soft_max_f32(1024, probs, logits, max_val);
// probs[] now contains exp(logits[i] - max) and sum is the total
Related Pages
- Ggml_org_Ggml_Cpu_vec_api -- Header declaring these functions plus inline element-wise operations.
- Ggml_org_Ggml_Cpu_simd_mappings -- SIMD abstraction macros used by these implementations.
- Ggml_org_Ggml_Cpu_tensor_ops -- Tensor operations that call these vector primitives.
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment