Implementation:Ggml org Ggml Cpu compute engine

Metadata

Field	Value
Page Type	Implementation (Core Engine)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, CPU_Backend
Last Updated	2025-05-15 12:00 GMT

Overview

Main C implementation of the CPU compute backend containing the thread pool, graph computation engine, operation dispatch, and NUMA-aware scheduling.

Description

ggml-cpu.c is the central execution engine of the CPU backend. All tensor computations on CPU flow through this file's graph compute infrastructure. It provides:

Lookup table initialization: Precomputes ggml_table_f32_f16 (256 KB f32-to-f16 conversion table) and ggml_table_f32_e8m0_half (1 KB e8m0 table) for fast type conversion at runtime.
Portable threading layer: Implements atomic operations for MSVC (using InterlockedExchange/InterlockedCompareExchange/InterlockedExchangeAdd) alongside standard <stdatomic.h> for other compilers. Supports both POSIX pthreads and Windows threading.
Thread pool (ggml_threadpool): A reusable pool with barrier synchronization, work polling, thread affinity control, and priority management. Threads spin or sleep waiting for work via atomic flags.
NUMA topology: Detects NUMA node layout on Linux by parsing /sys/devices/system/node/, enabling thread-to-node affinity for memory locality.
Graph computation: The ggml_graph_plan() function calculates required work buffer sizes per operation and thread count. The ggml_graph_compute() function executes a computation graph by dispatching each node to its ggml_compute_forward implementation across the thread pool.
Operation dispatch: Routes each tensor operation to specialized implementations in ops.cpp, unary-ops.cpp, binary-ops.cpp, or accelerator-specific code (llamafile SGEMM, AMX, KleidiAI).
OpenMP integration: When GGML_USE_OPENMP is defined, the thread pool is replaced with OpenMP parallel regions.

Usage

This file is compiled as part of the CPU backend library. Its public functions (ggml_graph_plan, ggml_graph_compute, ggml_cpu_init) are called by the backend interface layer or directly by applications that bypass the backend abstraction.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/ggml-cpu.c (3726 lines).

Signature

// Plan computation: determine work buffer size and thread requirements
struct ggml_cplan ggml_graph_plan(
    const struct ggml_cgraph * cgraph,
    int n_threads,
    struct ggml_threadpool * threadpool);

// Execute a computation graph according to a plan
enum ggml_status ggml_graph_compute(
    struct ggml_cgraph * cgraph,
    struct ggml_cplan  * cplan);

// Convenience wrapper: plan + compute with a context
enum ggml_status ggml_graph_compute_with_ctx(
    struct ggml_context * ctx,
    struct ggml_cgraph  * cgraph,
    int n_threads);

Import

#include "ggml-cpu.h"
#include "ggml.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`cgraph`	`struct ggml_cgraph *`	Yes	The computation graph containing tensor operations to execute.
`n_threads`	`int`	Yes	Number of CPU threads to use for parallel execution.
`threadpool`	`struct ggml_threadpool *`	No	Optional pre-created thread pool. If `NULL`, a default pool is created.
`cplan.work_data`	`uint8_t *`	Conditional	Work buffer allocated by the caller based on `ggml_graph_plan` output.

Outputs

Output	Type	Description
Return status	`enum ggml_status`	`GGML_STATUS_SUCCESS` on success, or an error code on failure/abort.
`cplan`	`struct ggml_cplan`	Populated by `ggml_graph_plan` with required `work_size`, `n_threads`, and operation metadata.

Usage Examples

Basic Graph Computation

#include "ggml.h"
#include "ggml-cpu.h"

// After building a computation graph...
struct ggml_cgraph * graph = ggml_new_graph(ctx);
ggml_build_forward_expand(graph, result);

// Plan the computation
struct ggml_cplan plan = ggml_graph_plan(graph, 4, NULL);

// Allocate work buffer
plan.work_data = malloc(plan.work_size);

// Execute
enum ggml_status status = ggml_graph_compute(graph, &plan);

free(plan.work_data);

Using a Thread Pool

#include "ggml.h"
#include "ggml-cpu.h"

// Create a reusable thread pool
struct ggml_threadpool_params tpp = ggml_threadpool_params_default(8);
struct ggml_threadpool * pool = ggml_threadpool_new(&tpp);

// Plan with thread pool
struct ggml_cplan plan = ggml_graph_plan(graph, 8, pool);
plan.work_data = malloc(plan.work_size);

enum ggml_status status = ggml_graph_compute(graph, &plan);

free(plan.work_data);
ggml_threadpool_free(pool);

Related Pages

Ggml_org_Ggml_Cpu_backend_interface -- Backend API wrapper that calls ggml_graph_plan/ggml_graph_compute.
Ggml_org_Ggml_Cpu_tensor_ops -- Operation implementations dispatched by the compute engine.
Ggml_org_Ggml_Cpu_impl -- Internal header defining ggml_compute_params used by all forward functions.
Ggml_org_Ggml_Cpu_sgemm -- llamafile SGEMM kernels invoked during matrix multiply dispatch.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment