Implementation:Ggml org Ggml Cann backend
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Backend) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, NPU_Computing |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
Main implementation of the CANN backend, providing the complete GGML backend interface for running tensor operations on Huawei Ascend NPU devices.
Description
ggml-cann.cpp (2,896 lines) is the core CANN backend file that adapts GGML's backend abstraction to Huawei's ACL (Ascend Computing Language) runtime. It enables inference on Ascend 910B (training-class), 310P (inference-class), and other Ascend NPU variants.
The implementation covers:
Device management:
- Thread-local device tracking via
g_current_cann_devicefor efficient context switching ggml_cann_set_device()avoids redundantaclrtSetDevicecalls when the current thread already targets the correct device- Device initialization retrieves device count, VRAM, and VMM granularity
Buffer management:
ggml_backend_cann_buffer_context-- manages device memory allocations- Supports buffer allocation, memset, tensor data transfer (host-to-device, device-to-host, device-to-device)
- Pinned host buffer support for faster CPU-NPU transfers
- Buffer type registration per device
Graph computation:
- Dispatches each graph node to the appropriate
ggml_cann_*function fromaclnn_ops.cpp - Supports operation capability checking to determine which ops can run on the NPU
- Graph caching via LRU cache for compiled computation graphs
Memory pools:
ggml_cann_pool_buf-- buffer-based pool with priority-ordered free listggml_cann_pool_vmm-- virtual memory managed pool (when VMM is supported)
Backend registration:
- Implements the full
ggml_backend_reg_iinterface for dynamic backend discovery - Exports via
GGML_BACKEND_DL_IMPLfor dynamic library loading
Usage
This file is the entry point for CANN backend functionality. Users interact with it through the public API declared in ggml-cann.h: initializing backends, querying device information, and managing buffers.
Code Reference
Source Location
GGML repo, file: src/ggml-cann/ggml-cann.cpp, 2896 lines.
Signature
// Error handling (defined here, declared in common.h)
[[noreturn]] void ggml_cann_error(const char * stmt, const char * func,
const char * file, int line, const char * msg);
// Device management
void ggml_cann_set_device(int32_t device);
// Environment utilities
std::optional<std::string> get_env_as_lowercase(const std::string & name);
bool parse_bool(const std::string & value);
int parse_integer(const std::string & value);
// Public API (from ggml-cann.h)
ggml_backend_t ggml_backend_cann_init(int32_t device);
bool ggml_backend_is_cann(ggml_backend_t backend);
ggml_backend_buffer_type_t ggml_backend_cann_buffer_type(int32_t device);
int32_t ggml_backend_cann_get_device_count(void);
ggml_backend_reg_t ggml_backend_cann_reg(void);
Import
#include "ggml-cann.h"
Dependencies
ggml-backend-impl.h-- backend implementation interfacesggml-cann/aclnn_ops.h-- operation dispatch functionsggml-cann/common.h-- shared context and memory pool typesggml-impl.h-- internal utilitiesacl/acl.h-- Huawei ACL runtime APIggml-common.h-- quantization block structures
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
device |
int32_t |
Yes | Ascend NPU device index (0 to GGML_CANN_MAX_DEVICES - 1).
|
cgraph |
ggml_cgraph * |
Yes (for graph_compute) | Computation graph with tensor operations to execute on the NPU. |
Outputs
| Output | Type | Description |
|---|---|---|
| Backend handle | ggml_backend_t |
Initialized CANN backend instance for the specified device. |
| Buffer type | ggml_backend_buffer_type_t |
Device-specific buffer type for memory allocation. |
| Device count | int32_t |
Number of available Ascend NPU devices. |
| Computation status | ggml_status |
Success or failure of graph computation. |
Usage Examples
Initializing the CANN Backend
#include "ggml-cann.h"
// Query available devices
int n_devices = ggml_backend_cann_get_device_count();
// Initialize backend for device 0
ggml_backend_t cann = ggml_backend_cann_init(0);
// Get device memory info
size_t free_mem, total_mem;
ggml_backend_cann_get_device_memory(0, &free_mem, &total_mem);
Using CANN with the Backend Scheduler
#include "ggml-cann.h"
#include "ggml-cpu.h"
ggml_backend_t cann_backend = ggml_backend_cann_init(0);
ggml_backend_t cpu_backend = ggml_backend_cpu_init();
ggml_backend_t backends[] = { cann_backend, cpu_backend };
ggml_backend_sched_t sched = ggml_backend_sched_new(
backends, NULL, 2, max_nodes, false);