Implementation:Ggml org Ggml Cpu backend api
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (API Header) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
Declares the CPU backend interface, including compute plan creation, thread pool management, NUMA configuration, CPU feature detection, and data type conversion functions.
Description
ggml-cpu.h (151 lines) defines the always-available CPU backend API. As the default execution backend, it is the foundation upon which all other backends build. The header provides:
Compute plan (ggml_cplan):
work_size/work_data-- scratch buffer calculated byggml_graph_plan()and allocated by callern_threads/threadpool-- parallelism configurationabort_callback-- allows early termination of computationuse_ref-- forces reference implementations for testing
NUMA configuration:
ggml_numa_strategyenum -- DISABLED, DISTRIBUTE, ISOLATE, NUMACTL, MIRRORggml_numa_init()/ggml_is_numa()-- detect and configure NUMA topology
Thread pool management:
ggml_threadpool_new/free/pause/resume/get_n_threads-- lifecycle and control
Graph computation:
ggml_graph_plan()-- compute required work buffer sizeggml_graph_compute()-- execute the computation graphggml_graph_compute_with_ctx()-- convenience wrapper using context memory
CPU feature detection (SIMD):
- x86: SSE3, SSSE3, AVX, AVX2, AVX-VNNI, AVX-512, AVX-512-VBMI/VNNI/BF16, AMX-INT8, BMI2, F16C, FMA
- ARM: NEON, ARM FMA, FP16 VA, DOTPROD, MATMUL_INT8, SVE, SME
- Other: RISC-V V, VSX, VXE, WASM SIMD, llamafile
Type traits:
ggml_type_traits_cpu-- per-typefrom_float,vec_dot,vec_dot_type, andnrowsggml_get_type_traits_cpu()-- retrieves CPU-specific type traits
Data conversion:
ggml_cpu_fp32_to_fp16/bf16/i32and reverse conversions
Usage
Include this header in any code that needs to run GGML computation graphs on the CPU, configure threading, detect CPU SIMD capabilities, or manage compute plans.
Code Reference
Source Location
GGML repo, file: include/ggml-cpu.h, 151 lines.
Signature
// CPU backend lifecycle
GGML_BACKEND_API ggml_backend_t ggml_backend_cpu_init(void);
GGML_BACKEND_API bool ggml_backend_is_cpu(ggml_backend_t backend);
GGML_BACKEND_API void ggml_backend_cpu_set_n_threads(ggml_backend_t backend_cpu,
int n_threads);
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_cpu_reg(void);
// Graph computation
GGML_BACKEND_API struct ggml_cplan ggml_graph_plan(
const struct ggml_cgraph * cgraph, int n_threads,
struct ggml_threadpool * threadpool);
GGML_BACKEND_API enum ggml_status ggml_graph_compute(
struct ggml_cgraph * cgraph, struct ggml_cplan * cplan);
// CPU feature detection (representative subset)
GGML_BACKEND_API int ggml_cpu_has_avx2(void);
GGML_BACKEND_API int ggml_cpu_has_neon(void);
GGML_BACKEND_API int ggml_cpu_has_sve(void);
// Type traits
GGML_BACKEND_API const struct ggml_type_traits_cpu *
ggml_get_type_traits_cpu(enum ggml_type type);
Import
#include "ggml-cpu.h"
Dependencies
ggml.h-- core GGML typesggml-backend.h-- backend abstraction types
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
cgraph |
const ggml_cgraph * |
Yes (for graph_plan) | Computation graph to plan or execute. |
n_threads |
int |
Yes (for graph_plan) | Number of threads for parallel execution. |
threadpool |
ggml_threadpool * |
No | Optional thread pool (NULL for default). |
cplan |
ggml_cplan * |
Yes (for graph_compute) | Pre-computed plan with allocated work buffer. |
Outputs
| Output | Type | Description |
|---|---|---|
| Backend handle | ggml_backend_t |
Initialized CPU backend instance. |
| Compute plan | ggml_cplan |
Plan struct with required work_size for the given graph.
|
| Status | ggml_status |
Success or failure of graph computation. |
| Feature flag | int |
1 if the CPU supports the queried SIMD feature, 0 otherwise. |
Usage Examples
CPU Backend with Graph Computation
#include "ggml-cpu.h"
// Initialize CPU backend
ggml_backend_t cpu = ggml_backend_cpu_init();
ggml_backend_cpu_set_n_threads(cpu, 4);
// Plan and compute a graph
struct ggml_cplan plan = ggml_graph_plan(graph, 4, NULL);
plan.work_data = malloc(plan.work_size);
ggml_graph_compute(graph, &plan);
free(plan.work_data);
Feature Detection
#include "ggml-cpu.h"
if (ggml_cpu_has_avx2()) {
printf("AVX2 available\n");
}
if (ggml_cpu_has_neon()) {
printf("ARM NEON available\n");
}