Implementation:Ggml org Ggml Cpu backend api

Metadata

Field	Value
Page Type	Implementation (API Header)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing
Last Updated	2026-02-10 12:00 GMT

Overview

Declares the CPU backend interface, including compute plan creation, thread pool management, NUMA configuration, CPU feature detection, and data type conversion functions.

Description

ggml-cpu.h (151 lines) defines the always-available CPU backend API. As the default execution backend, it is the foundation upon which all other backends build. The header provides:

Compute plan (ggml_cplan):

work_size / work_data -- scratch buffer calculated by ggml_graph_plan() and allocated by caller
n_threads / threadpool -- parallelism configuration
abort_callback -- allows early termination of computation
use_ref -- forces reference implementations for testing

NUMA configuration:

ggml_numa_strategy enum -- DISABLED, DISTRIBUTE, ISOLATE, NUMACTL, MIRROR
ggml_numa_init() / ggml_is_numa() -- detect and configure NUMA topology

Thread pool management:

ggml_threadpool_new/free/pause/resume/get_n_threads -- lifecycle and control

Graph computation:

ggml_graph_plan() -- compute required work buffer size
ggml_graph_compute() -- execute the computation graph
ggml_graph_compute_with_ctx() -- convenience wrapper using context memory

CPU feature detection (SIMD):

x86: SSE3, SSSE3, AVX, AVX2, AVX-VNNI, AVX-512, AVX-512-VBMI/VNNI/BF16, AMX-INT8, BMI2, F16C, FMA
ARM: NEON, ARM FMA, FP16 VA, DOTPROD, MATMUL_INT8, SVE, SME
Other: RISC-V V, VSX, VXE, WASM SIMD, llamafile

Type traits:

ggml_type_traits_cpu -- per-type from_float, vec_dot, vec_dot_type, and nrows
ggml_get_type_traits_cpu() -- retrieves CPU-specific type traits

Data conversion:

ggml_cpu_fp32_to_fp16/bf16/i32 and reverse conversions

Usage

Include this header in any code that needs to run GGML computation graphs on the CPU, configure threading, detect CPU SIMD capabilities, or manage compute plans.

Code Reference

Source Location

GGML repo, file: include/ggml-cpu.h, 151 lines.

Signature

// CPU backend lifecycle
GGML_BACKEND_API ggml_backend_t ggml_backend_cpu_init(void);
GGML_BACKEND_API bool ggml_backend_is_cpu(ggml_backend_t backend);
GGML_BACKEND_API void ggml_backend_cpu_set_n_threads(ggml_backend_t backend_cpu,
                                                      int n_threads);
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_cpu_reg(void);

// Graph computation
GGML_BACKEND_API struct ggml_cplan ggml_graph_plan(
    const struct ggml_cgraph * cgraph, int n_threads,
    struct ggml_threadpool * threadpool);
GGML_BACKEND_API enum ggml_status ggml_graph_compute(
    struct ggml_cgraph * cgraph, struct ggml_cplan * cplan);

// CPU feature detection (representative subset)
GGML_BACKEND_API int ggml_cpu_has_avx2(void);
GGML_BACKEND_API int ggml_cpu_has_neon(void);
GGML_BACKEND_API int ggml_cpu_has_sve(void);

// Type traits
GGML_BACKEND_API const struct ggml_type_traits_cpu *
    ggml_get_type_traits_cpu(enum ggml_type type);

Import

#include "ggml-cpu.h"

Dependencies

ggml.h -- core GGML types
ggml-backend.h -- backend abstraction types

I/O Contract

Inputs

Parameter	Type	Required	Description
`cgraph`	`const ggml_cgraph *`	Yes (for graph_plan)	Computation graph to plan or execute.
`n_threads`	`int`	Yes (for graph_plan)	Number of threads for parallel execution.
`threadpool`	`ggml_threadpool *`	No	Optional thread pool (NULL for default).
`cplan`	`ggml_cplan *`	Yes (for graph_compute)	Pre-computed plan with allocated work buffer.

Outputs

Output	Type	Description
Backend handle	`ggml_backend_t`	Initialized CPU backend instance.
Compute plan	`ggml_cplan`	Plan struct with required `work_size` for the given graph.
Status	`ggml_status`	Success or failure of graph computation.
Feature flag	`int`	1 if the CPU supports the queried SIMD feature, 0 otherwise.

Usage Examples

CPU Backend with Graph Computation

#include "ggml-cpu.h"

// Initialize CPU backend
ggml_backend_t cpu = ggml_backend_cpu_init();
ggml_backend_cpu_set_n_threads(cpu, 4);

// Plan and compute a graph
struct ggml_cplan plan = ggml_graph_plan(graph, 4, NULL);
plan.work_data = malloc(plan.work_size);
ggml_graph_compute(graph, &plan);
free(plan.work_data);

Feature Detection

#include "ggml-cpu.h"

if (ggml_cpu_has_avx2()) {
    printf("AVX2 available\n");
}
if (ggml_cpu_has_neon()) {
    printf("ARM NEON available\n");
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment