Implementation:Ggml org Ggml Cpu tensor ops

Metadata

Field	Value
Page Type	Implementation (Tensor Operations)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, CPU_Backend
Last Updated	2025-05-15 12:00 GMT

Overview

Implements all CPU forward-compute functions for tensor operations including dup, normalization, matrix multiply, attention, convolution, pooling, and more.

Description

ops.cpp is the largest source file in the CPU backend (10,900 lines), containing implementations of nearly all tensor operations needed for ML inference and training on CPU. Key operation categories include:

Data movement: ggml_compute_forward_dup, ggml_compute_forward_cpy, ggml_compute_forward_cont, ggml_compute_forward_get_rows, ggml_compute_forward_set_rows, ggml_compute_forward_concat.
Arithmetic: ggml_compute_forward_add, ggml_compute_forward_add1, ggml_compute_forward_acc, ggml_compute_forward_scale.
Reductions: ggml_compute_forward_sum, ggml_compute_forward_sum_rows, ggml_compute_forward_mean, ggml_compute_forward_argmax, ggml_compute_forward_count_equal.
Normalization: ggml_compute_forward_norm, ggml_compute_forward_rms_norm, ggml_compute_forward_group_norm, ggml_compute_forward_l2_norm.
Matrix operations: ggml_compute_forward_out_prod, ggml_compute_forward_set.
Attention: Flash attention with tiled implementation.
Convolution/Pooling: ggml_compute_forward_conv_*, im2col, pooling.
Positional encoding: RoPE (Rotary Positional Encoding) with multiple modes.
Sequence models: SSM scan/conv, RWKV WKV kernels.
Training: AdamW, SGD optimizer steps, cross-entropy loss.

Each function takes ggml_compute_params and a destination tensor, reads source tensors from dst->src[], and performs parallelized computation by splitting work across rows or blocks based on params->ith/params->nth. C++ templates handle type dispatch across f32, f16, and bf16 formats.

Usage

These functions are called indirectly through the compute engine's dispatch table. They are not meant to be called directly by user code.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/ops.cpp (10,900 lines).

Signature

// Representative signatures (all follow the same pattern):
void ggml_compute_forward_dup(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_add(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_rms_norm(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_softmax(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_rope(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_get_rows(const ggml_compute_params * params, ggml_tensor * dst);
void ggml_compute_forward_concat(const ggml_compute_params * params, ggml_tensor * dst);

Import

#include "ops.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`params`	`const ggml_compute_params *`	Yes	Thread index, thread count, work buffer, and threadpool reference.
`dst`	`ggml_tensor *`	Yes	Destination tensor; source tensors are accessed via `dst->src[0]`, `dst->src[1]`, etc.

Outputs

Output	Type	Description
`dst->data`	`void *`	The destination tensor's data buffer is filled with the operation result.

Usage Examples

How Operations Are Dispatched (Internal)

// Inside the compute engine (ggml-cpu.c), each graph node is dispatched:
switch (node->op) {
    case GGML_OP_DUP:
        ggml_compute_forward_dup(&params, node);
        break;
    case GGML_OP_ADD:
        ggml_compute_forward_add(&params, node);
        break;
    case GGML_OP_RMS_NORM:
        ggml_compute_forward_rms_norm(&params, node);
        break;
    // ... ~80 more operations
}

Related Pages

Ggml_org_Ggml_Cpu_compute_engine -- The graph compute engine that dispatches to these operations.
Ggml_org_Ggml_Cpu_unary_ops -- Element-wise unary operations (abs, relu, sigmoid, etc.).
Ggml_org_Ggml_Cpu_vec_api -- Vectorized math primitives used by these operations.
Ggml_org_Ggml_Cpu_quantization -- Quantization functions used by matrix multiply and get_rows.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment