Implementation:Ggml org Ggml Cann aclnn ops

Metadata

Field	Value
Page Type	Implementation (Operation Kernels)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, NPU_Computing
Last Updated	2026-02-10 12:00 GMT

Overview

Implements all GGML tensor operations for the CANN backend by mapping them to Huawei Ascend ACLNN operator calls.

Description

aclnn_ops.cpp is the operation kernel library for the CANN backend (3,928 lines). Each GGML operation has a dedicated ggml_cann_* function that follows a consistent pattern:

ACL tensor creation: Convert GGML source and destination tensors to ACL tensors via ggml_cann_create_tensor().
Broadcast handling: If source tensors differ in shape, the bcast_shape() helper reshapes tensors with appropriate broadcast dimensions.
Two-phase ACLNN execution: Call the operator's GetWorkspaceSize function to determine scratch memory requirements, allocate workspace from the CANN memory pool, then call the operator's execute function.
Workspace cleanup: RAII pool allocations automatically free scratch memory.

The file implements approximately 60 ACLNN operations spanning:

Arithmetic: add, sub, mul, div, pow, addcdiv
Activations: relu, silu, gelu, elu, sigmoid, tanh, hardswish, hardsigmoid, leaky_relu
Normalization: layer_norm, group_norm, rms_norm, add_rms_norm
Matrix operations: mm, mv, batch_matmul, grouped_matmul, out_prod
Attention: fused_infer_attention_score (critical for LLM inference)
Pooling/Convolution: avgpool2d, max_pool, convolution, im2col
Tensor manipulation: repeat, permute, concat, copy, cast, embedding, index_select, pad, roll, slice
Reduction: sum, mean, argmax, reduce_sum
Other: arange, clamp, softmax, log_softmax, tril, triu, upsample_nearest_2d

Helper functions ggml_cann_op_unary() and ggml_cann_op_unary_gated() provide shared wrappers for single-input and gated activation operations respectively.

Usage

These functions are called internally by the CANN backend's graph_compute dispatch loop. They are not intended to be called directly by user code. Each function corresponds to a GGML_OP_* enum value.

Code Reference

Source Location

GGML repo, file: src/ggml-cann/aclnn_ops.cpp, 3928 lines.

Signature

// Broadcast helper
void bcast_shape(ggml_tensor * src0, ggml_tensor * src1, ggml_tensor * dst,
                 acl_tensor_ptr & acl_src0, acl_tensor_ptr & acl_src1,
                 acl_tensor_ptr & acl_dst);

// Unary operation wrappers
void ggml_cann_op_unary(
    std::function<void(ggml_backend_cann_context &, aclTensor *, aclTensor *)> unary_op,
    ggml_backend_cann_context & ctx, ggml_tensor * dst);

void ggml_cann_op_unary_gated(
    std::function<void(ggml_backend_cann_context &, aclTensor *, aclTensor *)> unary_op,
    ggml_backend_cann_context & ctx, ggml_tensor * dst);

// Example operation signatures (representative subset)
void ggml_cann_add(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_mul_mat(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_softmax(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_rms_norm(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_rope(ggml_backend_cann_context & ctx, ggml_tensor * dst);

Import

#include "aclnn_ops.h"

Dependencies

aclnn_ops.h -- function declarations with Doxygen documentation
acl_tensor.h -- ACL tensor creation and smart pointers
common.h -- CANN context, memory pool, error handling
ggml-common.h -- quantization block structures
~60 aclnnop/*.h headers -- individual ACLNN operator APIs

I/O Contract

Inputs

Parameter	Type	Required	Description
`ctx`	`ggml_backend_cann_context &`	Yes	CANN backend context holding device ID, ACL stream, and memory pool.
`dst`	`ggml_tensor *`	Yes	Destination tensor. Source tensors are read from `dst->src[0]` and `dst->src[1]`. Operation parameters are in `dst->op_params`.

Outputs

Output	Type	Description
`dst->data`	device memory	Result is written to the destination tensor's data buffer on the Ascend NPU device.

Usage Examples

Internal Dispatch Pattern (within ggml-cann.cpp)

// Inside the CANN backend graph_compute function:
switch (node->op) {
    case GGML_OP_ADD:
        ggml_cann_add(ctx, node);
        break;
    case GGML_OP_MUL_MAT:
        ggml_cann_mul_mat(ctx, node);
        break;
    case GGML_OP_SOFT_MAX:
        ggml_cann_softmax(ctx, node);
        break;
    // ... other operations ...
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment