Implementation:Ggml org Ggml Zdnn backend

**Implementation Metadata**
File Name	`src/ggml-zdnn/ggml-zdnn.cpp`
Repository	ggml-org/ggml
Lines	633
Language	C++
Domain Tags	ML_Infrastructure, Hardware_Abstraction, IBM_Z
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

ggml-zdnn.cpp is the main implementation of the zDNN backend for IBM Z mainframes, providing the GGML backend interface for the Integrated Accelerator for AI (IBM AIU). This is an early-stage backend (633 lines) currently focused on matrix multiplication as the most impactful operation for inference performance.

Description

The backend currently supports only GGML_OP_MUL_MAT, delegating to ggml_zdnn_mul_mat_f for floating-point matrix multiplication. The ggml_zdnn_graph_compute function iterates over graph nodes, skipping empty, view, and metadata operations, and dispatching supported ops with a GGML_TENSOR_FLAG_COMPUTE check.

The ggml_zdnn_supports_op function validates operation compatibility with strict requirements:

MUL_MAT -- Requires contiguous 2D matrices with F32/F16/BF16 types and dimensions within max_size limits
Passthrough ops -- NONE, RESHAPE, VIEW, TRANSPOSE, PERMUTE are supported as no-ops

Usage

The zDNN backend is loaded on IBM Z hardware with the Integrated Accelerator for AI:

#include "ggml-backend.h"

int main(void) {
    ggml_backend_load_all();
    // zDNN backend registers if running on IBM Z with AIU
    ggml_backend_t backend = ggml_backend_init_best();
    // ...
}

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-zdnn/ggml-zdnn.cpp`	633

Key Signatures

static void ggml_zdnn_compute_forward_mul_mat(
    const ggml_backend_zdnn_context * ctx,
    ggml_tensor * dst);

static bool ggml_zdnn_compute_forward(
    ggml_backend_zdnn_context * ctx,
    ggml_tensor * dst);

static enum ggml_status ggml_zdnn_graph_compute(
    ggml_backend_t backend, ggml_cgraph * gf);

static bool ggml_zdnn_supports_op(
    const ggml_backend_zdnn_device_context * ctx_dev,
    const ggml_tensor * op);

I/O Contract

Inputs

GGML compute graph -- Tensor operation graph
Weight tensors -- Contiguous 2D matrices (F32, F16, or BF16)
Input tensors -- Contiguous 2D matrices (F32, F16, or BF16)

Outputs

Computed result -- Matrix multiplication output
GGML_STATUS_SUCCESS -- Returned on successful graph computation

Usage Examples

Supported operation check:

// The backend checks:
// 1. Weights and inputs must be 2D matrices (ggml_is_matrix)
// 2. Both must be contiguous (ggml_is_contiguous)
// 3. Types must be F32, F16, or BF16
// 4. Dimensions must not exceed ctx_dev->max_size
bool supported = ggml_zdnn_supports_op(ctx_dev, mul_mat_node);

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_ZDNN_Accelerated_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Backend_impl_interface -- Backend interface contract
Implementation:Ggml_org_Ggml_Ggml_backend_load_all -- Backend discovery

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment