Implementation:Ggml org Ggml Cann aclnn ops
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Operation Kernels) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, NPU_Computing |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
Implements all GGML tensor operations for the CANN backend by mapping them to Huawei Ascend ACLNN operator calls.
Description
aclnn_ops.cpp is the operation kernel library for the CANN backend (3,928 lines). Each GGML operation has a dedicated ggml_cann_* function that follows a consistent pattern:
- ACL tensor creation: Convert GGML source and destination tensors to ACL tensors via
ggml_cann_create_tensor(). - Broadcast handling: If source tensors differ in shape, the
bcast_shape()helper reshapes tensors with appropriate broadcast dimensions. - Two-phase ACLNN execution: Call the operator's
GetWorkspaceSizefunction to determine scratch memory requirements, allocate workspace from the CANN memory pool, then call the operator's execute function. - Workspace cleanup: RAII pool allocations automatically free scratch memory.
The file implements approximately 60 ACLNN operations spanning:
- Arithmetic: add, sub, mul, div, pow, addcdiv
- Activations: relu, silu, gelu, elu, sigmoid, tanh, hardswish, hardsigmoid, leaky_relu
- Normalization: layer_norm, group_norm, rms_norm, add_rms_norm
- Matrix operations: mm, mv, batch_matmul, grouped_matmul, out_prod
- Attention: fused_infer_attention_score (critical for LLM inference)
- Pooling/Convolution: avgpool2d, max_pool, convolution, im2col
- Tensor manipulation: repeat, permute, concat, copy, cast, embedding, index_select, pad, roll, slice
- Reduction: sum, mean, argmax, reduce_sum
- Other: arange, clamp, softmax, log_softmax, tril, triu, upsample_nearest_2d
Helper functions ggml_cann_op_unary() and ggml_cann_op_unary_gated() provide shared wrappers for single-input and gated activation operations respectively.
Usage
These functions are called internally by the CANN backend's graph_compute dispatch loop. They are not intended to be called directly by user code. Each function corresponds to a GGML_OP_* enum value.
Code Reference
Source Location
GGML repo, file: src/ggml-cann/aclnn_ops.cpp, 3928 lines.
Signature
// Broadcast helper
void bcast_shape(ggml_tensor * src0, ggml_tensor * src1, ggml_tensor * dst,
acl_tensor_ptr & acl_src0, acl_tensor_ptr & acl_src1,
acl_tensor_ptr & acl_dst);
// Unary operation wrappers
void ggml_cann_op_unary(
std::function<void(ggml_backend_cann_context &, aclTensor *, aclTensor *)> unary_op,
ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_op_unary_gated(
std::function<void(ggml_backend_cann_context &, aclTensor *, aclTensor *)> unary_op,
ggml_backend_cann_context & ctx, ggml_tensor * dst);
// Example operation signatures (representative subset)
void ggml_cann_add(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_mul_mat(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_softmax(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_rms_norm(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_rope(ggml_backend_cann_context & ctx, ggml_tensor * dst);
Import
#include "aclnn_ops.h"
Dependencies
aclnn_ops.h-- function declarations with Doxygen documentationacl_tensor.h-- ACL tensor creation and smart pointerscommon.h-- CANN context, memory pool, error handlingggml-common.h-- quantization block structures- ~60
aclnnop/*.hheaders -- individual ACLNN operator APIs
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
ctx |
ggml_backend_cann_context & |
Yes | CANN backend context holding device ID, ACL stream, and memory pool. |
dst |
ggml_tensor * |
Yes | Destination tensor. Source tensors are read from dst->src[0] and dst->src[1]. Operation parameters are in dst->op_params.
|
Outputs
| Output | Type | Description |
|---|---|---|
dst->data |
device memory | Result is written to the destination tensor's data buffer on the Ascend NPU device. |
Usage Examples
Internal Dispatch Pattern (within ggml-cann.cpp)
// Inside the CANN backend graph_compute function:
switch (node->op) {
case GGML_OP_ADD:
ggml_cann_add(ctx, node);
break;
case GGML_OP_MUL_MAT:
ggml_cann_mul_mat(ctx, node);
break;
case GGML_OP_SOFT_MAX:
ggml_cann_softmax(ctx, node);
break;
// ... other operations ...
}