Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Ggml Cann backend

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Backend)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, NPU_Computing
Last Updated 2026-02-10 12:00 GMT

Overview

Main implementation of the CANN backend, providing the complete GGML backend interface for running tensor operations on Huawei Ascend NPU devices.

Description

ggml-cann.cpp (2,896 lines) is the core CANN backend file that adapts GGML's backend abstraction to Huawei's ACL (Ascend Computing Language) runtime. It enables inference on Ascend 910B (training-class), 310P (inference-class), and other Ascend NPU variants.

The implementation covers:

Device management:

  • Thread-local device tracking via g_current_cann_device for efficient context switching
  • ggml_cann_set_device() avoids redundant aclrtSetDevice calls when the current thread already targets the correct device
  • Device initialization retrieves device count, VRAM, and VMM granularity

Buffer management:

  • ggml_backend_cann_buffer_context -- manages device memory allocations
  • Supports buffer allocation, memset, tensor data transfer (host-to-device, device-to-host, device-to-device)
  • Pinned host buffer support for faster CPU-NPU transfers
  • Buffer type registration per device

Graph computation:

  • Dispatches each graph node to the appropriate ggml_cann_* function from aclnn_ops.cpp
  • Supports operation capability checking to determine which ops can run on the NPU
  • Graph caching via LRU cache for compiled computation graphs

Memory pools:

  • ggml_cann_pool_buf -- buffer-based pool with priority-ordered free list
  • ggml_cann_pool_vmm -- virtual memory managed pool (when VMM is supported)

Backend registration:

  • Implements the full ggml_backend_reg_i interface for dynamic backend discovery
  • Exports via GGML_BACKEND_DL_IMPL for dynamic library loading

Usage

This file is the entry point for CANN backend functionality. Users interact with it through the public API declared in ggml-cann.h: initializing backends, querying device information, and managing buffers.

Code Reference

Source Location

GGML repo, file: src/ggml-cann/ggml-cann.cpp, 2896 lines.

Signature

// Error handling (defined here, declared in common.h)
[[noreturn]] void ggml_cann_error(const char * stmt, const char * func,
                                   const char * file, int line, const char * msg);

// Device management
void ggml_cann_set_device(int32_t device);

// Environment utilities
std::optional<std::string> get_env_as_lowercase(const std::string & name);
bool parse_bool(const std::string & value);
int  parse_integer(const std::string & value);

// Public API (from ggml-cann.h)
ggml_backend_t ggml_backend_cann_init(int32_t device);
bool ggml_backend_is_cann(ggml_backend_t backend);
ggml_backend_buffer_type_t ggml_backend_cann_buffer_type(int32_t device);
int32_t ggml_backend_cann_get_device_count(void);
ggml_backend_reg_t ggml_backend_cann_reg(void);

Import

#include "ggml-cann.h"

Dependencies

  • ggml-backend-impl.h -- backend implementation interfaces
  • ggml-cann/aclnn_ops.h -- operation dispatch functions
  • ggml-cann/common.h -- shared context and memory pool types
  • ggml-impl.h -- internal utilities
  • acl/acl.h -- Huawei ACL runtime API
  • ggml-common.h -- quantization block structures

I/O Contract

Inputs

Parameter Type Required Description
device int32_t Yes Ascend NPU device index (0 to GGML_CANN_MAX_DEVICES - 1).
cgraph ggml_cgraph * Yes (for graph_compute) Computation graph with tensor operations to execute on the NPU.

Outputs

Output Type Description
Backend handle ggml_backend_t Initialized CANN backend instance for the specified device.
Buffer type ggml_backend_buffer_type_t Device-specific buffer type for memory allocation.
Device count int32_t Number of available Ascend NPU devices.
Computation status ggml_status Success or failure of graph computation.

Usage Examples

Initializing the CANN Backend

#include "ggml-cann.h"

// Query available devices
int n_devices = ggml_backend_cann_get_device_count();

// Initialize backend for device 0
ggml_backend_t cann = ggml_backend_cann_init(0);

// Get device memory info
size_t free_mem, total_mem;
ggml_backend_cann_get_device_memory(0, &free_mem, &total_mem);

Using CANN with the Backend Scheduler

#include "ggml-cann.h"
#include "ggml-cpu.h"

ggml_backend_t cann_backend = ggml_backend_cann_init(0);
ggml_backend_t cpu_backend  = ggml_backend_cpu_init();

ggml_backend_t backends[] = { cann_backend, cpu_backend };
ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends, NULL, 2, max_nodes, false);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment