Implementation:Ggml org Ggml Cpu impl

Metadata

Field	Value
Page Type	Implementation (Internal Header)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, CPU_Backend
Last Updated	2025-05-15 12:00 GMT

Overview

Internal header defining CPU compute parameters, MSVC SIMD compatibility macros, and ARM NEON polyfill functions for 32-bit targets.

Description

ggml-cpu-impl.h is the foundational internal header included by nearly all CPU backend source files. It provides:

Compute parameters struct: Defines ggml_compute_params containing the thread index (ith), thread count (nth), work buffer pointer and size, threadpool reference, and a use_ref flag for selecting reference implementations.
MSVC SIMD feature compatibility: On MSVC, feature-test macros like __FMA__, __F16C__, __SSE3__, and __SSSE3__ are not defined even when the corresponding instructions are available via AVX/AVX2/AVX512. This header defines them so downstream code can use uniform #ifdef checks.
ARM NEON 32-bit polyfills: Provides inline implementations of AArch64-only intrinsics for 32-bit ARM targets: vaddlvq_s16, vpaddq_s16, vpaddq_s32, vaddvq_s32, vaddvq_f32, vmaxvq_f32, vcvtnq_s32_f32, vzip1_u8, and vzip2_u8.
Platform includes: Conditionally includes SVE headers with prctl support on Linux, s390x VXE/VXE2 defines, and m512bh/m512i cast macros for AVX-512 BF16 support.

Usage

Include this header in any CPU backend source file that needs access to the ggml_compute_params struct or requires cross-platform SIMD feature detection. It is automatically included by most CPU backend files.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/ggml-cpu-impl.h (529 lines).

Signature

struct ggml_compute_params {
    int ith, nth;           // thread index, number of threads
    size_t wsize;
    void * wdata;
    struct ggml_threadpool * threadpool;
    bool use_ref;
};

Import

#include "ggml-cpu-impl.h"

I/O Contract

Inputs

Field	Type	Description
`ith`	`int`	Index of the current thread (0-based).
`nth`	`int`	Total number of threads participating in the computation.
`wsize`	`size_t`	Size of the shared work buffer in bytes.
`wdata`	`void *`	Pointer to the shared work buffer for all threads.
`threadpool`	`struct ggml_threadpool *`	Pointer to the thread pool managing execution.
`use_ref`	`bool`	If true, use reference (non-optimized) implementation for correctness testing.

Outputs

Output	Type	Description
N/A	N/A	This is a data structure header; outputs are indirect through its usage by compute functions.

Usage Examples

Accessing Compute Parameters in a Forward Function

#include "ggml-cpu-impl.h"

void ggml_compute_forward_my_op(
        const ggml_compute_params * params,
        ggml_tensor * dst) {
    const int ith = params->ith;  // current thread index
    const int nth = params->nth;  // total threads

    // Partition work across threads
    const int n = ggml_nelements(dst);
    const int chunk = (n + nth - 1) / nth;
    const int start = ith * chunk;
    const int end   = MIN(start + chunk, n);

    for (int i = start; i < end; i++) {
        // perform computation
    }
}

Related Pages

Ggml_org_Ggml_Cpu_compute_engine -- The main compute engine that populates and dispatches ggml_compute_params.
Ggml_org_Ggml_Cpu_simd_mappings -- SIMD abstraction macros that depend on the feature flags defined here.
Ggml_org_Ggml_Cpu_tensor_ops -- Tensor operations that consume ggml_compute_params.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment