Implementation:Ggml org Ggml Zendnn backend

**Implementation Metadata**
File Name	`src/ggml-zendnn/ggml-zendnn.cpp`
Repository	ggml-org/ggml
Lines	469
Language	C++
Domain Tags	ML_Infrastructure, Hardware_Abstraction, AMD_CPU
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

ggml-zendnn.cpp implements the ZenDNN backend for optimized matrix multiplication on AMD Zen CPUs via the ZenDNN library's low-overhead hardware abstraction. This focused backend (469 lines) accelerates the dominant matrix multiplication workload on AMD Zen processors using the "lowoha" (Low-Overhead Hardware Abstraction) path.

Description

The ggml_backend_zendnn_context holds thread count and a work buffer. A template function ggml_to_zendnn_type maps C++ types (float, ggml_bf16_t) to zendnnl::common::data_type_t.

The core ggml_zendnn_matmul template function uses zendnnl::lowoha::matmul_direct with:

Row-major layout
Transposed weights (column-major to row-major via true transpose flag)
alpha=1.0, beta=0.0
is_weights_const=true for weight transformation caching across calls

The ggml_zendnn_sgemm dispatcher selects type-specific instantiations:

F32 x F32 -> F32
BF16 x BF16 -> BF16
BF16 x BF16 -> F32

Usage

#include "ggml-backend.h"

int main(void) {
    ggml_backend_load_all();
    // ZenDNN backend registers on AMD Zen systems with ZenDNN installed
    ggml_backend_t backend = ggml_backend_init_best();
    // ...
}

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-zendnn/ggml-zendnn.cpp`	469

Key Signatures

struct ggml_backend_zendnn_context {
    int n_threads = GGML_DEFAULT_N_THREADS;
    std::unique_ptr<char[]> work_data;
    size_t work_size = 0;
};

template<typename T>
zendnnl::common::data_type_t ggml_to_zendnn_type();

template <typename TA, typename TB, typename TC>
static bool ggml_zendnn_matmul(ggml_backend_zendnn_context * ctx,
    int64_t m, int64_t n, int64_t k,
    const TA * A, int64_t lda,
    const TB * B, int64_t ldb,
    TC * C, int64_t ldc);

static bool ggml_zendnn_sgemm(ggml_backend_zendnn_context * ctx,
    int64_t m, int64_t n, int64_t k,
    const void * A, int64_t lda,
    const void * B, int64_t ldb,
    void * C, int64_t ldc,
    int Atype, int Btype, int Ctype);

I/O Contract

Inputs

A (weights) -- Weight matrix, shape (k, m), column-major
B (input) -- Input matrix, shape (n, k), row-major
Type parameters -- F32 or BF16 for each matrix

Outputs

C (output) -- Result matrix, shape (n, m), row-major
Boolean status -- true on success, false on failure

Usage Examples

Matrix multiplication with ZenDNN:

// ZenDNN computes C = B * A where:
//   A: weights [k, m] column-major
//   B: input   [n, k] row-major
//   C: output  [n, m] row-major
//
// The lowoha path provides:
// - Direct matmul execution with minimal API overhead
// - Weight transformation caching via is_weights_const=true
// - Multi-threaded execution via ctx->n_threads

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_ZenDNN_Accelerated_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Backend_impl_interface -- Backend interface contract
Implementation:Ggml_org_Ggml_Ggml_backend_load_all -- Backend discovery

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment