Implementation:Ggml org Ggml Zendnn backend
| File Name | src/ggml-zendnn/ggml-zendnn.cpp
|
| Repository | ggml-org/ggml |
| Lines | 469 |
| Language | C++ |
| Domain Tags | ML_Infrastructure, Hardware_Abstraction, AMD_CPU |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
ggml-zendnn.cpp implements the ZenDNN backend for optimized matrix multiplication on AMD Zen CPUs via the ZenDNN library's low-overhead hardware abstraction. This focused backend (469 lines) accelerates the dominant matrix multiplication workload on AMD Zen processors using the "lowoha" (Low-Overhead Hardware Abstraction) path.
Description
The ggml_backend_zendnn_context holds thread count and a work buffer. A template function ggml_to_zendnn_type maps C++ types (float, ggml_bf16_t) to zendnnl::common::data_type_t.
The core ggml_zendnn_matmul template function uses zendnnl::lowoha::matmul_direct with:
- Row-major layout
- Transposed weights (column-major to row-major via
truetranspose flag) alpha=1.0,beta=0.0is_weights_const=truefor weight transformation caching across calls
The ggml_zendnn_sgemm dispatcher selects type-specific instantiations:
- F32 x F32 -> F32
- BF16 x BF16 -> BF16
- BF16 x BF16 -> F32
Usage
#include "ggml-backend.h"
int main(void) {
ggml_backend_load_all();
// ZenDNN backend registers on AMD Zen systems with ZenDNN installed
ggml_backend_t backend = ggml_backend_init_best();
// ...
}
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-zendnn/ggml-zendnn.cpp |
469 |
Key Signatures
struct ggml_backend_zendnn_context {
int n_threads = GGML_DEFAULT_N_THREADS;
std::unique_ptr<char[]> work_data;
size_t work_size = 0;
};
template<typename T>
zendnnl::common::data_type_t ggml_to_zendnn_type();
template <typename TA, typename TB, typename TC>
static bool ggml_zendnn_matmul(ggml_backend_zendnn_context * ctx,
int64_t m, int64_t n, int64_t k,
const TA * A, int64_t lda,
const TB * B, int64_t ldb,
TC * C, int64_t ldc);
static bool ggml_zendnn_sgemm(ggml_backend_zendnn_context * ctx,
int64_t m, int64_t n, int64_t k,
const void * A, int64_t lda,
const void * B, int64_t ldb,
void * C, int64_t ldc,
int Atype, int Btype, int Ctype);
I/O Contract
Inputs
- A (weights) -- Weight matrix, shape (k, m), column-major
- B (input) -- Input matrix, shape (n, k), row-major
- Type parameters -- F32 or BF16 for each matrix
Outputs
- C (output) -- Result matrix, shape (n, m), row-major
- Boolean status --
trueon success,falseon failure
Usage Examples
Matrix multiplication with ZenDNN:
// ZenDNN computes C = B * A where: // A: weights [k, m] column-major // B: input [n, k] row-major // C: output [n, m] row-major // // The lowoha path provides: // - Direct matmul execution with minimal API overhead // - Weight transformation caching via is_weights_const=true // - Multi-threaded execution via ctx->n_threads
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Backend_impl_interface -- Backend interface contract
- Implementation:Ggml_org_Ggml_Ggml_backend_load_all -- Backend discovery