Implementation:Ggml org Ggml Cpu compute engine
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Core Engine) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, CPU_Backend |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Main C implementation of the CPU compute backend containing the thread pool, graph computation engine, operation dispatch, and NUMA-aware scheduling.
Description
ggml-cpu.c is the central execution engine of the CPU backend. All tensor computations on CPU flow through this file's graph compute infrastructure. It provides:
- Lookup table initialization: Precomputes
ggml_table_f32_f16(256 KB f32-to-f16 conversion table) andggml_table_f32_e8m0_half(1 KB e8m0 table) for fast type conversion at runtime. - Portable threading layer: Implements atomic operations for MSVC (using
InterlockedExchange/InterlockedCompareExchange/InterlockedExchangeAdd) alongside standard<stdatomic.h>for other compilers. Supports both POSIX pthreads and Windows threading. - Thread pool (
ggml_threadpool): A reusable pool with barrier synchronization, work polling, thread affinity control, and priority management. Threads spin or sleep waiting for work via atomic flags. - NUMA topology: Detects NUMA node layout on Linux by parsing
/sys/devices/system/node/, enabling thread-to-node affinity for memory locality. - Graph computation: The
ggml_graph_plan()function calculates required work buffer sizes per operation and thread count. Theggml_graph_compute()function executes a computation graph by dispatching each node to itsggml_compute_forwardimplementation across the thread pool. - Operation dispatch: Routes each tensor operation to specialized implementations in
ops.cpp,unary-ops.cpp,binary-ops.cpp, or accelerator-specific code (llamafile SGEMM, AMX, KleidiAI). - OpenMP integration: When
GGML_USE_OPENMPis defined, the thread pool is replaced with OpenMP parallel regions.
Usage
This file is compiled as part of the CPU backend library. Its public functions (ggml_graph_plan, ggml_graph_compute, ggml_cpu_init) are called by the backend interface layer or directly by applications that bypass the backend abstraction.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/ggml-cpu.c (3726 lines).
Signature
// Plan computation: determine work buffer size and thread requirements
struct ggml_cplan ggml_graph_plan(
const struct ggml_cgraph * cgraph,
int n_threads,
struct ggml_threadpool * threadpool);
// Execute a computation graph according to a plan
enum ggml_status ggml_graph_compute(
struct ggml_cgraph * cgraph,
struct ggml_cplan * cplan);
// Convenience wrapper: plan + compute with a context
enum ggml_status ggml_graph_compute_with_ctx(
struct ggml_context * ctx,
struct ggml_cgraph * cgraph,
int n_threads);
Import
#include "ggml-cpu.h"
#include "ggml.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
cgraph |
struct ggml_cgraph * |
Yes | The computation graph containing tensor operations to execute. |
n_threads |
int |
Yes | Number of CPU threads to use for parallel execution. |
threadpool |
struct ggml_threadpool * |
No | Optional pre-created thread pool. If NULL, a default pool is created.
|
cplan.work_data |
uint8_t * |
Conditional | Work buffer allocated by the caller based on ggml_graph_plan output.
|
Outputs
| Output | Type | Description |
|---|---|---|
| Return status | enum ggml_status |
GGML_STATUS_SUCCESS on success, or an error code on failure/abort.
|
cplan |
struct ggml_cplan |
Populated by ggml_graph_plan with required work_size, n_threads, and operation metadata.
|
Usage Examples
Basic Graph Computation
#include "ggml.h"
#include "ggml-cpu.h"
// After building a computation graph...
struct ggml_cgraph * graph = ggml_new_graph(ctx);
ggml_build_forward_expand(graph, result);
// Plan the computation
struct ggml_cplan plan = ggml_graph_plan(graph, 4, NULL);
// Allocate work buffer
plan.work_data = malloc(plan.work_size);
// Execute
enum ggml_status status = ggml_graph_compute(graph, &plan);
free(plan.work_data);
Using a Thread Pool
#include "ggml.h"
#include "ggml-cpu.h"
// Create a reusable thread pool
struct ggml_threadpool_params tpp = ggml_threadpool_params_default(8);
struct ggml_threadpool * pool = ggml_threadpool_new(&tpp);
// Plan with thread pool
struct ggml_cplan plan = ggml_graph_plan(graph, 8, pool);
plan.work_data = malloc(plan.work_size);
enum ggml_status status = ggml_graph_compute(graph, &plan);
free(plan.work_data);
ggml_threadpool_free(pool);
Related Pages
- Ggml_org_Ggml_Cpu_backend_interface -- Backend API wrapper that calls
ggml_graph_plan/ggml_graph_compute. - Ggml_org_Ggml_Cpu_tensor_ops -- Operation implementations dispatched by the compute engine.
- Ggml_org_Ggml_Cpu_impl -- Internal header defining
ggml_compute_paramsused by all forward functions. - Ggml_org_Ggml_Cpu_sgemm -- llamafile SGEMM kernels invoked during matrix multiply dispatch.