Implementation:Ggml org Ggml Hexagon matmul ops

**Implementation Metadata**
File Name	`src/ggml-hexagon/htp/matmul-ops.c`
Repository	ggml-org/ggml
Lines	2476
Language	C
Domain Tags	ML_Infrastructure, DSP_Computing, Matrix_Operations
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

matmul-ops.c is the DSP-side implementation of matrix multiplication (and mul_mat_id) operations for the Hexagon HVX vector processor. At ~2476 lines, it is the largest and most performance-critical operation file in the entire Hexagon backend. Matrix multiplication dominates inference compute time, and this file supports quantized formats (Q4_0, Q8_0, MXFP4) for memory-efficient inference.

Description

The file defines htp_matmul_type with function pointers for vec_dot and vec_dot_rx2 (dual-row dot product) variants. It uses HVX vdelta control tables for value replication across vector lanes:

repl_1x_f32 -- Replicate first FP32 value across all 32 elements
repl_4x_f32 -- Replicate first 4 FP32 values across lanes
repl_interleave_8x_f32 -- Replicate and interleave 8 values
repl_1x_f16, repl_2x_f16 -- FP16 value replication
expand_x32_e8m0 -- Expand 32 e8m0 values to uint32

Supported type combinations include F32xF32, F32xF16, F16xF16, Q4_0, Q8_0, and MXFP4. Scratchpad tiling uses configurable row counts (MM_SPAD_SRC0_NROWS=16, MM_SPAD_SRC1_NROWS=16, MM_SPAD_DST_NROWS=2).

The file also implements mul_mat_id for expert-routing in mixture-of-experts models with mmid_row_mapping.

Usage

Dispatched from the DSP-side message loop for GGML_OP_MUL_MAT and GGML_OP_MUL_MAT_ID operations.

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-hexagon/htp/matmul-ops.c`	2476

Key Signatures

#define MM_SPAD_SRC0_NROWS 16
#define MM_SPAD_SRC1_NROWS 16
#define MM_SPAD_DST_NROWS  2

struct htp_matmul_type {
    const char * type;
    void (*vec_dot)(const int n, float * restrict s, const void * restrict vx, const void * restrict vy);
    void (*vec_dot_rx2)(const int n, float * restrict s, const void * restrict vx,
        uint32_t vx_row_size, const void * restrict vy);
};

struct mmid_row_mapping {
    // Maps expert IDs to row indices for MoE dispatch
};

I/O Contract

Inputs

src0 -- Weight matrix (may be quantized: F32, F16, Q4_0, Q8_0, MXFP4)
src1 -- Input matrix (typically F32 or F16)
op_params -- Operation parameters including type combination info

Outputs

dst -- Result matrix (F32)

Usage Examples

Internal matmul dispatch:

// Selected based on src0/src1 type combination
htp_matmul_type matmul = get_matmul_type(src0_type, src1_type);
matmul.vec_dot(n, &result, src0_row, src1_row);
// Or dual-row variant for higher throughput
matmul.vec_dot_rx2(n, results, src0_rows, row_size, src1_row);

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_Hexagon_DSP_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Hexagon_htp_main -- Message dispatcher
Implementation:Ggml_org_Ggml_Hexagon_flash_attn -- Also uses dot products
Implementation:Ggml_org_Ggml_Hexagon_hvx_arith -- Low-level HVX arithmetic
Implementation:Ggml_org_Ggml_Hexagon_backend -- Host-side backend

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment