Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu compute engine

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Core Engine)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, CPU_Backend
Last Updated 2025-05-15 12:00 GMT

Overview

Main C implementation of the CPU compute backend containing the thread pool, graph computation engine, operation dispatch, and NUMA-aware scheduling.

Description

ggml-cpu.c is the central execution engine of the CPU backend. All tensor computations on CPU flow through this file's graph compute infrastructure. It provides:

  1. Lookup table initialization: Precomputes ggml_table_f32_f16 (256 KB f32-to-f16 conversion table) and ggml_table_f32_e8m0_half (1 KB e8m0 table) for fast type conversion at runtime.
  2. Portable threading layer: Implements atomic operations for MSVC (using InterlockedExchange/InterlockedCompareExchange/InterlockedExchangeAdd) alongside standard <stdatomic.h> for other compilers. Supports both POSIX pthreads and Windows threading.
  3. Thread pool (ggml_threadpool): A reusable pool with barrier synchronization, work polling, thread affinity control, and priority management. Threads spin or sleep waiting for work via atomic flags.
  4. NUMA topology: Detects NUMA node layout on Linux by parsing /sys/devices/system/node/, enabling thread-to-node affinity for memory locality.
  5. Graph computation: The ggml_graph_plan() function calculates required work buffer sizes per operation and thread count. The ggml_graph_compute() function executes a computation graph by dispatching each node to its ggml_compute_forward implementation across the thread pool.
  6. Operation dispatch: Routes each tensor operation to specialized implementations in ops.cpp, unary-ops.cpp, binary-ops.cpp, or accelerator-specific code (llamafile SGEMM, AMX, KleidiAI).
  7. OpenMP integration: When GGML_USE_OPENMP is defined, the thread pool is replaced with OpenMP parallel regions.

Usage

This file is compiled as part of the CPU backend library. Its public functions (ggml_graph_plan, ggml_graph_compute, ggml_cpu_init) are called by the backend interface layer or directly by applications that bypass the backend abstraction.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/ggml-cpu.c (3726 lines).

Signature

// Plan computation: determine work buffer size and thread requirements
struct ggml_cplan ggml_graph_plan(
    const struct ggml_cgraph * cgraph,
    int n_threads,
    struct ggml_threadpool * threadpool);

// Execute a computation graph according to a plan
enum ggml_status ggml_graph_compute(
    struct ggml_cgraph * cgraph,
    struct ggml_cplan  * cplan);

// Convenience wrapper: plan + compute with a context
enum ggml_status ggml_graph_compute_with_ctx(
    struct ggml_context * ctx,
    struct ggml_cgraph  * cgraph,
    int n_threads);

Import

#include "ggml-cpu.h"
#include "ggml.h"

I/O Contract

Inputs

Parameter Type Required Description
cgraph struct ggml_cgraph * Yes The computation graph containing tensor operations to execute.
n_threads int Yes Number of CPU threads to use for parallel execution.
threadpool struct ggml_threadpool * No Optional pre-created thread pool. If NULL, a default pool is created.
cplan.work_data uint8_t * Conditional Work buffer allocated by the caller based on ggml_graph_plan output.

Outputs

Output Type Description
Return status enum ggml_status GGML_STATUS_SUCCESS on success, or an error code on failure/abort.
cplan struct ggml_cplan Populated by ggml_graph_plan with required work_size, n_threads, and operation metadata.

Usage Examples

Basic Graph Computation

#include "ggml.h"
#include "ggml-cpu.h"

// After building a computation graph...
struct ggml_cgraph * graph = ggml_new_graph(ctx);
ggml_build_forward_expand(graph, result);

// Plan the computation
struct ggml_cplan plan = ggml_graph_plan(graph, 4, NULL);

// Allocate work buffer
plan.work_data = malloc(plan.work_size);

// Execute
enum ggml_status status = ggml_graph_compute(graph, &plan);

free(plan.work_data);

Using a Thread Pool

#include "ggml.h"
#include "ggml-cpu.h"

// Create a reusable thread pool
struct ggml_threadpool_params tpp = ggml_threadpool_params_default(8);
struct ggml_threadpool * pool = ggml_threadpool_new(&tpp);

// Plan with thread pool
struct ggml_cplan plan = ggml_graph_plan(graph, 8, pool);
plan.work_data = malloc(plan.work_size);

enum ggml_status status = ggml_graph_compute(graph, &plan);

free(plan.work_data);
ggml_threadpool_free(pool);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment