Implementation:Ggml org Ggml Zdnn backend
| File Name | src/ggml-zdnn/ggml-zdnn.cpp
|
| Repository | ggml-org/ggml |
| Lines | 633 |
| Language | C++ |
| Domain Tags | ML_Infrastructure, Hardware_Abstraction, IBM_Z |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
ggml-zdnn.cpp is the main implementation of the zDNN backend for IBM Z mainframes, providing the GGML backend interface for the Integrated Accelerator for AI (IBM AIU). This is an early-stage backend (633 lines) currently focused on matrix multiplication as the most impactful operation for inference performance.
Description
The backend currently supports only GGML_OP_MUL_MAT, delegating to ggml_zdnn_mul_mat_f for floating-point matrix multiplication. The ggml_zdnn_graph_compute function iterates over graph nodes, skipping empty, view, and metadata operations, and dispatching supported ops with a GGML_TENSOR_FLAG_COMPUTE check.
The ggml_zdnn_supports_op function validates operation compatibility with strict requirements:
- MUL_MAT -- Requires contiguous 2D matrices with F32/F16/BF16 types and dimensions within
max_sizelimits - Passthrough ops -- NONE, RESHAPE, VIEW, TRANSPOSE, PERMUTE are supported as no-ops
Usage
The zDNN backend is loaded on IBM Z hardware with the Integrated Accelerator for AI:
#include "ggml-backend.h"
int main(void) {
ggml_backend_load_all();
// zDNN backend registers if running on IBM Z with AIU
ggml_backend_t backend = ggml_backend_init_best();
// ...
}
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-zdnn/ggml-zdnn.cpp |
633 |
Key Signatures
static void ggml_zdnn_compute_forward_mul_mat(
const ggml_backend_zdnn_context * ctx,
ggml_tensor * dst);
static bool ggml_zdnn_compute_forward(
ggml_backend_zdnn_context * ctx,
ggml_tensor * dst);
static enum ggml_status ggml_zdnn_graph_compute(
ggml_backend_t backend, ggml_cgraph * gf);
static bool ggml_zdnn_supports_op(
const ggml_backend_zdnn_device_context * ctx_dev,
const ggml_tensor * op);
I/O Contract
Inputs
- GGML compute graph -- Tensor operation graph
- Weight tensors -- Contiguous 2D matrices (F32, F16, or BF16)
- Input tensors -- Contiguous 2D matrices (F32, F16, or BF16)
Outputs
- Computed result -- Matrix multiplication output
GGML_STATUS_SUCCESS-- Returned on successful graph computation
Usage Examples
Supported operation check:
// The backend checks: // 1. Weights and inputs must be 2D matrices (ggml_is_matrix) // 2. Both must be contiguous (ggml_is_contiguous) // 3. Types must be F32, F16, or BF16 // 4. Dimensions must not exceed ctx_dev->max_size bool supported = ggml_zdnn_supports_op(ctx_dev, mul_mat_node);
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Backend_impl_interface -- Backend interface contract
- Implementation:Ggml_org_Ggml_Ggml_backend_load_all -- Backend discovery