Implementation:Ggml org Ggml Hexagon binary ops
| File Name | src/ggml-hexagon/htp/binary-ops.c
|
| Repository | ggml-org/ggml |
| Lines | 343 |
| Language | C |
| Domain Tags | ML_Infrastructure, DSP_Computing, Element_Wise_Operations |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
binary-ops.c is the DSP-side implementation of element-wise binary operations (multiply, add, subtract) on the Hexagon HVX vector processor. These operations are used throughout ML inference for residual connections, scaling, and masking.
Description
The file uses function pointer tables (func_table_HVX and func_table_HVX_opt) to dispatch to HVX-optimized variants. The optimized variants (hvx_mul_f32_aa, hvx_add_f32_aa, hvx_sub_f32_aa) are selected when both source and destination buffers are 128-byte aligned and row sizes are VLEN-aligned.
The binary_job_f32_per_thread function parallelizes work by distributing source rows across threads. It handles broadcasting by computing source row indices via modular arithmetic when source and destination dimensions differ.
The htp_binary_preamble macro extracts all tensor dimensions (ne00-ne13, ne0-ne3), strides (nb00-nb13, nb0-nb3), and per-thread row counts from the operation context.
Usage
These operations are dispatched from the DSP-side message loop when the host sends binary operation requests (GGML_OP_ADD, GGML_OP_MUL, GGML_OP_SUB).
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-hexagon/htp/binary-ops.c |
343 |
Key Signatures
typedef void (*hvx_elemwise_f32_func)(uint8_t * data_dst, const uint8_t * src0,
const uint8_t * src1, const uint32_t num_elems);
static hvx_elemwise_f32_func func_table_HVX[] = { hvx_mul_f32, hvx_add_f32, hvx_sub_f32 };
static hvx_elemwise_f32_func func_table_HVX_opt[] = { hvx_mul_f32_aa, hvx_add_f32_aa, hvx_sub_f32_aa };
static void binary_job_f32_per_thread(struct htp_ops_context * octx,
uint8_t * spad_data, uint32_t nth, uint32_t ith, enum htp_op op);
I/O Contract
Inputs
- octx -- Operation context containing source tensors, destination tensor, and per-thread row counts
- spad_data -- Scratchpad memory for intermediate storage
- nth, ith -- Total thread count and current thread index
- op -- Binary operation type (multiply, add, subtract)
Outputs
- dst tensor -- Result of element-wise binary operation written to destination buffer
Usage Examples
Internal dispatch:
// Called from main.c op_binary() dispatch binary_job_f32_per_thread(octx, spad_data, nth, ith, HTP_OP_MUL); binary_job_f32_per_thread(octx, spad_data, nth, ith, HTP_OP_ADD); binary_job_f32_per_thread(octx, spad_data, nth, ith, HTP_OP_SUB);
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Hexagon_hvx_arith -- HVX arithmetic intrinsics used by this file
- Implementation:Ggml_org_Ggml_Hexagon_htp_main -- Message dispatcher
- Implementation:Ggml_org_Ggml_Hexagon_backend -- Host-side backend