Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu impl

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Internal Header)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, CPU_Backend
Last Updated 2025-05-15 12:00 GMT

Overview

Internal header defining CPU compute parameters, MSVC SIMD compatibility macros, and ARM NEON polyfill functions for 32-bit targets.

Description

ggml-cpu-impl.h is the foundational internal header included by nearly all CPU backend source files. It provides:

  1. Compute parameters struct: Defines ggml_compute_params containing the thread index (ith), thread count (nth), work buffer pointer and size, threadpool reference, and a use_ref flag for selecting reference implementations.
  2. MSVC SIMD feature compatibility: On MSVC, feature-test macros like __FMA__, __F16C__, __SSE3__, and __SSSE3__ are not defined even when the corresponding instructions are available via AVX/AVX2/AVX512. This header defines them so downstream code can use uniform #ifdef checks.
  3. ARM NEON 32-bit polyfills: Provides inline implementations of AArch64-only intrinsics for 32-bit ARM targets: vaddlvq_s16, vpaddq_s16, vpaddq_s32, vaddvq_s32, vaddvq_f32, vmaxvq_f32, vcvtnq_s32_f32, vzip1_u8, and vzip2_u8.
  4. Platform includes: Conditionally includes SVE headers with prctl support on Linux, s390x VXE/VXE2 defines, and m512bh/m512i cast macros for AVX-512 BF16 support.

Usage

Include this header in any CPU backend source file that needs access to the ggml_compute_params struct or requires cross-platform SIMD feature detection. It is automatically included by most CPU backend files.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/ggml-cpu-impl.h (529 lines).

Signature

struct ggml_compute_params {
    int ith, nth;           // thread index, number of threads
    size_t wsize;
    void * wdata;
    struct ggml_threadpool * threadpool;
    bool use_ref;
};

Import

#include "ggml-cpu-impl.h"

I/O Contract

Inputs

Field Type Description
ith int Index of the current thread (0-based).
nth int Total number of threads participating in the computation.
wsize size_t Size of the shared work buffer in bytes.
wdata void * Pointer to the shared work buffer for all threads.
threadpool struct ggml_threadpool * Pointer to the thread pool managing execution.
use_ref bool If true, use reference (non-optimized) implementation for correctness testing.

Outputs

Output Type Description
N/A N/A This is a data structure header; outputs are indirect through its usage by compute functions.

Usage Examples

Accessing Compute Parameters in a Forward Function

#include "ggml-cpu-impl.h"

void ggml_compute_forward_my_op(
        const ggml_compute_params * params,
        ggml_tensor * dst) {
    const int ith = params->ith;  // current thread index
    const int nth = params->nth;  // total threads

    // Partition work across threads
    const int n = ggml_nelements(dst);
    const int chunk = (n + nth - 1) / nth;
    const int start = ith * chunk;
    const int end   = MIN(start + chunk, n);

    for (int i = start; i < end; i++) {
        // perform computation
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment