Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Zdnn backend

From Leeroopedia


Implementation Metadata
File Name src/ggml-zdnn/ggml-zdnn.cpp
Repository ggml-org/ggml
Lines 633
Language C++
Domain Tags ML_Infrastructure, Hardware_Abstraction, IBM_Z
Status Active
Last Updated 2025-05-15 12:00 GMT
Knowledge Sources ggml-org/ggml repository

Overview

ggml-zdnn.cpp is the main implementation of the zDNN backend for IBM Z mainframes, providing the GGML backend interface for the Integrated Accelerator for AI (IBM AIU). This is an early-stage backend (633 lines) currently focused on matrix multiplication as the most impactful operation for inference performance.

Description

The backend currently supports only GGML_OP_MUL_MAT, delegating to ggml_zdnn_mul_mat_f for floating-point matrix multiplication. The ggml_zdnn_graph_compute function iterates over graph nodes, skipping empty, view, and metadata operations, and dispatching supported ops with a GGML_TENSOR_FLAG_COMPUTE check.

The ggml_zdnn_supports_op function validates operation compatibility with strict requirements:

  • MUL_MAT -- Requires contiguous 2D matrices with F32/F16/BF16 types and dimensions within max_size limits
  • Passthrough ops -- NONE, RESHAPE, VIEW, TRANSPOSE, PERMUTE are supported as no-ops

Usage

The zDNN backend is loaded on IBM Z hardware with the Integrated Accelerator for AI:

#include "ggml-backend.h"

int main(void) {
    ggml_backend_load_all();
    // zDNN backend registers if running on IBM Z with AIU
    ggml_backend_t backend = ggml_backend_init_best();
    // ...
}

Code Reference

Source Location

Repository File Lines
ggml-org/ggml src/ggml-zdnn/ggml-zdnn.cpp 633

Key Signatures

static void ggml_zdnn_compute_forward_mul_mat(
    const ggml_backend_zdnn_context * ctx,
    ggml_tensor * dst);

static bool ggml_zdnn_compute_forward(
    ggml_backend_zdnn_context * ctx,
    ggml_tensor * dst);

static enum ggml_status ggml_zdnn_graph_compute(
    ggml_backend_t backend, ggml_cgraph * gf);

static bool ggml_zdnn_supports_op(
    const ggml_backend_zdnn_device_context * ctx_dev,
    const ggml_tensor * op);

I/O Contract

Inputs

  • GGML compute graph -- Tensor operation graph
  • Weight tensors -- Contiguous 2D matrices (F32, F16, or BF16)
  • Input tensors -- Contiguous 2D matrices (F32, F16, or BF16)

Outputs

  • Computed result -- Matrix multiplication output
  • GGML_STATUS_SUCCESS -- Returned on successful graph computation

Usage Examples

Supported operation check:

// The backend checks:
// 1. Weights and inputs must be 2D matrices (ggml_is_matrix)
// 2. Both must be contiguous (ggml_is_contiguous)
// 3. Types must be F32, F16, or BF16
// 4. Dimensions must not exceed ctx_dev->max_size
bool supported = ggml_zdnn_supports_op(ctx_dev, mul_mat_node);

Related Pages

Implements Principle

Related Implementations

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment