Implementation:Ggml org Ggml Hexagon matmul ops
| File Name | src/ggml-hexagon/htp/matmul-ops.c
|
| Repository | ggml-org/ggml |
| Lines | 2476 |
| Language | C |
| Domain Tags | ML_Infrastructure, DSP_Computing, Matrix_Operations |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
matmul-ops.c is the DSP-side implementation of matrix multiplication (and mul_mat_id) operations for the Hexagon HVX vector processor. At ~2476 lines, it is the largest and most performance-critical operation file in the entire Hexagon backend. Matrix multiplication dominates inference compute time, and this file supports quantized formats (Q4_0, Q8_0, MXFP4) for memory-efficient inference.
Description
The file defines htp_matmul_type with function pointers for vec_dot and vec_dot_rx2 (dual-row dot product) variants. It uses HVX vdelta control tables for value replication across vector lanes:
repl_1x_f32-- Replicate first FP32 value across all 32 elementsrepl_4x_f32-- Replicate first 4 FP32 values across lanesrepl_interleave_8x_f32-- Replicate and interleave 8 valuesrepl_1x_f16,repl_2x_f16-- FP16 value replicationexpand_x32_e8m0-- Expand 32 e8m0 values to uint32
Supported type combinations include F32xF32, F32xF16, F16xF16, Q4_0, Q8_0, and MXFP4. Scratchpad tiling uses configurable row counts (MM_SPAD_SRC0_NROWS=16, MM_SPAD_SRC1_NROWS=16, MM_SPAD_DST_NROWS=2).
The file also implements mul_mat_id for expert-routing in mixture-of-experts models with mmid_row_mapping.
Usage
Dispatched from the DSP-side message loop for GGML_OP_MUL_MAT and GGML_OP_MUL_MAT_ID operations.
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-hexagon/htp/matmul-ops.c |
2476 |
Key Signatures
#define MM_SPAD_SRC0_NROWS 16
#define MM_SPAD_SRC1_NROWS 16
#define MM_SPAD_DST_NROWS 2
struct htp_matmul_type {
const char * type;
void (*vec_dot)(const int n, float * restrict s, const void * restrict vx, const void * restrict vy);
void (*vec_dot_rx2)(const int n, float * restrict s, const void * restrict vx,
uint32_t vx_row_size, const void * restrict vy);
};
struct mmid_row_mapping {
// Maps expert IDs to row indices for MoE dispatch
};
I/O Contract
Inputs
- src0 -- Weight matrix (may be quantized: F32, F16, Q4_0, Q8_0, MXFP4)
- src1 -- Input matrix (typically F32 or F16)
- op_params -- Operation parameters including type combination info
Outputs
- dst -- Result matrix (F32)
Usage Examples
Internal matmul dispatch:
// Selected based on src0/src1 type combination htp_matmul_type matmul = get_matmul_type(src0_type, src1_type); matmul.vec_dot(n, &result, src0_row, src1_row); // Or dual-row variant for higher throughput matmul.vec_dot_rx2(n, results, src0_rows, row_size, src1_row);
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Hexagon_htp_main -- Message dispatcher
- Implementation:Ggml_org_Ggml_Hexagon_flash_attn -- Also uses dot products
- Implementation:Ggml_org_Ggml_Hexagon_hvx_arith -- Low-level HVX arithmetic
- Implementation:Ggml_org_Ggml_Hexagon_backend -- Host-side backend