Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Ggml Cann aclnn ops

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Operation Kernels)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, NPU_Computing
Last Updated 2026-02-10 12:00 GMT

Overview

Implements all GGML tensor operations for the CANN backend by mapping them to Huawei Ascend ACLNN operator calls.

Description

aclnn_ops.cpp is the operation kernel library for the CANN backend (3,928 lines). Each GGML operation has a dedicated ggml_cann_* function that follows a consistent pattern:

  1. ACL tensor creation: Convert GGML source and destination tensors to ACL tensors via ggml_cann_create_tensor().
  2. Broadcast handling: If source tensors differ in shape, the bcast_shape() helper reshapes tensors with appropriate broadcast dimensions.
  3. Two-phase ACLNN execution: Call the operator's GetWorkspaceSize function to determine scratch memory requirements, allocate workspace from the CANN memory pool, then call the operator's execute function.
  4. Workspace cleanup: RAII pool allocations automatically free scratch memory.

The file implements approximately 60 ACLNN operations spanning:

  • Arithmetic: add, sub, mul, div, pow, addcdiv
  • Activations: relu, silu, gelu, elu, sigmoid, tanh, hardswish, hardsigmoid, leaky_relu
  • Normalization: layer_norm, group_norm, rms_norm, add_rms_norm
  • Matrix operations: mm, mv, batch_matmul, grouped_matmul, out_prod
  • Attention: fused_infer_attention_score (critical for LLM inference)
  • Pooling/Convolution: avgpool2d, max_pool, convolution, im2col
  • Tensor manipulation: repeat, permute, concat, copy, cast, embedding, index_select, pad, roll, slice
  • Reduction: sum, mean, argmax, reduce_sum
  • Other: arange, clamp, softmax, log_softmax, tril, triu, upsample_nearest_2d

Helper functions ggml_cann_op_unary() and ggml_cann_op_unary_gated() provide shared wrappers for single-input and gated activation operations respectively.

Usage

These functions are called internally by the CANN backend's graph_compute dispatch loop. They are not intended to be called directly by user code. Each function corresponds to a GGML_OP_* enum value.

Code Reference

Source Location

GGML repo, file: src/ggml-cann/aclnn_ops.cpp, 3928 lines.

Signature

// Broadcast helper
void bcast_shape(ggml_tensor * src0, ggml_tensor * src1, ggml_tensor * dst,
                 acl_tensor_ptr & acl_src0, acl_tensor_ptr & acl_src1,
                 acl_tensor_ptr & acl_dst);

// Unary operation wrappers
void ggml_cann_op_unary(
    std::function<void(ggml_backend_cann_context &, aclTensor *, aclTensor *)> unary_op,
    ggml_backend_cann_context & ctx, ggml_tensor * dst);

void ggml_cann_op_unary_gated(
    std::function<void(ggml_backend_cann_context &, aclTensor *, aclTensor *)> unary_op,
    ggml_backend_cann_context & ctx, ggml_tensor * dst);

// Example operation signatures (representative subset)
void ggml_cann_add(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_mul_mat(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_softmax(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_rms_norm(ggml_backend_cann_context & ctx, ggml_tensor * dst);
void ggml_cann_rope(ggml_backend_cann_context & ctx, ggml_tensor * dst);

Import

#include "aclnn_ops.h"

Dependencies

  • aclnn_ops.h -- function declarations with Doxygen documentation
  • acl_tensor.h -- ACL tensor creation and smart pointers
  • common.h -- CANN context, memory pool, error handling
  • ggml-common.h -- quantization block structures
  • ~60 aclnnop/*.h headers -- individual ACLNN operator APIs

I/O Contract

Inputs

Parameter Type Required Description
ctx ggml_backend_cann_context & Yes CANN backend context holding device ID, ACL stream, and memory pool.
dst ggml_tensor * Yes Destination tensor. Source tensors are read from dst->src[0] and dst->src[1]. Operation parameters are in dst->op_params.

Outputs

Output Type Description
dst->data device memory Result is written to the destination tensor's data buffer on the Ascend NPU device.

Usage Examples

Internal Dispatch Pattern (within ggml-cann.cpp)

// Inside the CANN backend graph_compute function:
switch (node->op) {
    case GGML_OP_ADD:
        ggml_cann_add(ctx, node);
        break;
    case GGML_OP_MUL_MAT:
        ggml_cann_mul_mat(ctx, node);
        break;
    case GGML_OP_SOFT_MAX:
        ggml_cann_softmax(ctx, node);
        break;
    // ... other operations ...
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment