Implementation:Ggml org Ggml Cpu tensor ops
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Tensor Operations) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, CPU_Backend |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Implements all CPU forward-compute functions for tensor operations including dup, normalization, matrix multiply, attention, convolution, pooling, and more.
Description
ops.cpp is the largest source file in the CPU backend (10,900 lines), containing implementations of nearly all tensor operations needed for ML inference and training on CPU. Key operation categories include:
- Data movement:
ggml_compute_forward_dup,ggml_compute_forward_cpy,ggml_compute_forward_cont,ggml_compute_forward_get_rows,ggml_compute_forward_set_rows,ggml_compute_forward_concat. - Arithmetic:
ggml_compute_forward_add,ggml_compute_forward_add1,ggml_compute_forward_acc,ggml_compute_forward_scale. - Reductions:
ggml_compute_forward_sum,ggml_compute_forward_sum_rows,ggml_compute_forward_mean,ggml_compute_forward_argmax,ggml_compute_forward_count_equal. - Normalization:
ggml_compute_forward_norm,ggml_compute_forward_rms_norm,ggml_compute_forward_group_norm,ggml_compute_forward_l2_norm. - Matrix operations:
ggml_compute_forward_out_prod,ggml_compute_forward_set. - Attention: Flash attention with tiled implementation.
- Convolution/Pooling:
ggml_compute_forward_conv_*, im2col, pooling. - Positional encoding: RoPE (Rotary Positional Encoding) with multiple modes.
- Sequence models: SSM scan/conv, RWKV WKV kernels.
- Training: AdamW, SGD optimizer steps, cross-entropy loss.
Each function takes ggml_compute_params and a destination tensor, reads source tensors from dst->src[], and performs parallelized computation by splitting work across rows or blocks based on params->ith/params->nth. C++ templates handle type dispatch across f32, f16, and bf16 formats.
Usage
These functions are called indirectly through the compute engine's dispatch table. They are not meant to be called directly by user code.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/ops.cpp (10,900 lines).
Signature
// Representative signatures (all follow the same pattern):
void ggml_compute_forward_dup(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_add(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_rms_norm(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_softmax(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_rope(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_get_rows(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_concat(const ggml_compute_params * params, ggml_tensor * dst);
Import
#include "ops.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
params |
const ggml_compute_params * |
Yes | Thread index, thread count, work buffer, and threadpool reference. |
dst |
ggml_tensor * |
Yes | Destination tensor; source tensors are accessed via dst->src[0], dst->src[1], etc.
|
Outputs
| Output | Type | Description |
|---|---|---|
dst->data |
void * |
The destination tensor's data buffer is filled with the operation result. |
Usage Examples
How Operations Are Dispatched (Internal)
// Inside the compute engine (ggml-cpu.c), each graph node is dispatched:
switch (node->op) {
case GGML_OP_DUP:
ggml_compute_forward_dup(¶ms, node);
break;
case GGML_OP_ADD:
ggml_compute_forward_add(¶ms, node);
break;
case GGML_OP_RMS_NORM:
ggml_compute_forward_rms_norm(¶ms, node);
break;
// ... ~80 more operations
}
Related Pages
- Ggml_org_Ggml_Cpu_compute_engine -- The graph compute engine that dispatches to these operations.
- Ggml_org_Ggml_Cpu_unary_ops -- Element-wise unary operations (abs, relu, sigmoid, etc.).
- Ggml_org_Ggml_Cpu_vec_api -- Vectorized math primitives used by these operations.
- Ggml_org_Ggml_Cpu_quantization -- Quantization functions used by matrix multiply and get_rows.