Implementation:Ggml org Ggml Cpu impl
Appearance
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Internal Header) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, CPU_Backend |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Internal header defining CPU compute parameters, MSVC SIMD compatibility macros, and ARM NEON polyfill functions for 32-bit targets.
Description
ggml-cpu-impl.h is the foundational internal header included by nearly all CPU backend source files. It provides:
- Compute parameters struct: Defines
ggml_compute_paramscontaining the thread index (ith), thread count (nth), work buffer pointer and size, threadpool reference, and ause_refflag for selecting reference implementations. - MSVC SIMD feature compatibility: On MSVC, feature-test macros like
__FMA__,__F16C__,__SSE3__, and__SSSE3__are not defined even when the corresponding instructions are available via AVX/AVX2/AVX512. This header defines them so downstream code can use uniform#ifdefchecks. - ARM NEON 32-bit polyfills: Provides inline implementations of AArch64-only intrinsics for 32-bit ARM targets:
vaddlvq_s16,vpaddq_s16,vpaddq_s32,vaddvq_s32,vaddvq_f32,vmaxvq_f32,vcvtnq_s32_f32,vzip1_u8, andvzip2_u8. - Platform includes: Conditionally includes SVE headers with
prctlsupport on Linux, s390x VXE/VXE2 defines, andm512bh/m512icast macros for AVX-512 BF16 support.
Usage
Include this header in any CPU backend source file that needs access to the ggml_compute_params struct or requires cross-platform SIMD feature detection. It is automatically included by most CPU backend files.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/ggml-cpu-impl.h (529 lines).
Signature
struct ggml_compute_params {
int ith, nth; // thread index, number of threads
size_t wsize;
void * wdata;
struct ggml_threadpool * threadpool;
bool use_ref;
};
Import
#include "ggml-cpu-impl.h"
I/O Contract
Inputs
| Field | Type | Description |
|---|---|---|
ith |
int |
Index of the current thread (0-based). |
nth |
int |
Total number of threads participating in the computation. |
wsize |
size_t |
Size of the shared work buffer in bytes. |
wdata |
void * |
Pointer to the shared work buffer for all threads. |
threadpool |
struct ggml_threadpool * |
Pointer to the thread pool managing execution. |
use_ref |
bool |
If true, use reference (non-optimized) implementation for correctness testing. |
Outputs
| Output | Type | Description |
|---|---|---|
| N/A | N/A | This is a data structure header; outputs are indirect through its usage by compute functions. |
Usage Examples
Accessing Compute Parameters in a Forward Function
#include "ggml-cpu-impl.h"
void ggml_compute_forward_my_op(
const ggml_compute_params * params,
ggml_tensor * dst) {
const int ith = params->ith; // current thread index
const int nth = params->nth; // total threads
// Partition work across threads
const int n = ggml_nelements(dst);
const int chunk = (n + nth - 1) / nth;
const int start = ith * chunk;
const int end = MIN(start + chunk, n);
for (int i = start; i < end; i++) {
// perform computation
}
}
Related Pages
- Ggml_org_Ggml_Cpu_compute_engine -- The main compute engine that populates and dispatches
ggml_compute_params. - Ggml_org_Ggml_Cpu_simd_mappings -- SIMD abstraction macros that depend on the feature flags defined here.
- Ggml_org_Ggml_Cpu_tensor_ops -- Tensor operations that consume
ggml_compute_params.
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment