Implementation:Ggml org Ggml Cann backend

Metadata

Field	Value
Page Type	Implementation (Backend)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, NPU_Computing
Last Updated	2026-02-10 12:00 GMT

Overview

Main implementation of the CANN backend, providing the complete GGML backend interface for running tensor operations on Huawei Ascend NPU devices.

Description

ggml-cann.cpp (2,896 lines) is the core CANN backend file that adapts GGML's backend abstraction to Huawei's ACL (Ascend Computing Language) runtime. It enables inference on Ascend 910B (training-class), 310P (inference-class), and other Ascend NPU variants.

The implementation covers:

Device management:

Thread-local device tracking via g_current_cann_device for efficient context switching
ggml_cann_set_device() avoids redundant aclrtSetDevice calls when the current thread already targets the correct device
Device initialization retrieves device count, VRAM, and VMM granularity

Buffer management:

ggml_backend_cann_buffer_context -- manages device memory allocations
Supports buffer allocation, memset, tensor data transfer (host-to-device, device-to-host, device-to-device)
Pinned host buffer support for faster CPU-NPU transfers
Buffer type registration per device

Graph computation:

Dispatches each graph node to the appropriate ggml_cann_* function from aclnn_ops.cpp
Supports operation capability checking to determine which ops can run on the NPU
Graph caching via LRU cache for compiled computation graphs

Memory pools:

ggml_cann_pool_buf -- buffer-based pool with priority-ordered free list
ggml_cann_pool_vmm -- virtual memory managed pool (when VMM is supported)

Backend registration:

Implements the full ggml_backend_reg_i interface for dynamic backend discovery
Exports via GGML_BACKEND_DL_IMPL for dynamic library loading

Usage

This file is the entry point for CANN backend functionality. Users interact with it through the public API declared in ggml-cann.h: initializing backends, querying device information, and managing buffers.

Code Reference

Source Location

GGML repo, file: src/ggml-cann/ggml-cann.cpp, 2896 lines.

Signature

// Error handling (defined here, declared in common.h)
[[noreturn]] void ggml_cann_error(const char * stmt, const char * func,
                                   const char * file, int line, const char * msg);

// Device management
void ggml_cann_set_device(int32_t device);

// Environment utilities
std::optional<std::string> get_env_as_lowercase(const std::string & name);
bool parse_bool(const std::string & value);
int  parse_integer(const std::string & value);

// Public API (from ggml-cann.h)
ggml_backend_t ggml_backend_cann_init(int32_t device);
bool ggml_backend_is_cann(ggml_backend_t backend);
ggml_backend_buffer_type_t ggml_backend_cann_buffer_type(int32_t device);
int32_t ggml_backend_cann_get_device_count(void);
ggml_backend_reg_t ggml_backend_cann_reg(void);

Import

#include "ggml-cann.h"

Dependencies

ggml-backend-impl.h -- backend implementation interfaces
ggml-cann/aclnn_ops.h -- operation dispatch functions
ggml-cann/common.h -- shared context and memory pool types
ggml-impl.h -- internal utilities
acl/acl.h -- Huawei ACL runtime API
ggml-common.h -- quantization block structures

I/O Contract

Inputs

Parameter	Type	Required	Description
`device`	`int32_t`	Yes	Ascend NPU device index (0 to `GGML_CANN_MAX_DEVICES - 1`).
`cgraph`	`ggml_cgraph *`	Yes (for graph_compute)	Computation graph with tensor operations to execute on the NPU.

Outputs

Output	Type	Description
Backend handle	`ggml_backend_t`	Initialized CANN backend instance for the specified device.
Buffer type	`ggml_backend_buffer_type_t`	Device-specific buffer type for memory allocation.
Device count	`int32_t`	Number of available Ascend NPU devices.
Computation status	`ggml_status`	Success or failure of graph computation.

Usage Examples

Initializing the CANN Backend

#include "ggml-cann.h"

// Query available devices
int n_devices = ggml_backend_cann_get_device_count();

// Initialize backend for device 0
ggml_backend_t cann = ggml_backend_cann_init(0);

// Get device memory info
size_t free_mem, total_mem;
ggml_backend_cann_get_device_memory(0, &free_mem, &total_mem);

Using CANN with the Backend Scheduler

#include "ggml-cann.h"
#include "ggml-cpu.h"

ggml_backend_t cann_backend = ggml_backend_cann_init(0);
ggml_backend_t cpu_backend  = ggml_backend_cpu_init();

ggml_backend_t backends[] = { cann_backend, cpu_backend };
ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends, NULL, 2, max_nodes, false);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment