Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Hexagon binary ops

From Leeroopedia


Implementation Metadata
File Name src/ggml-hexagon/htp/binary-ops.c
Repository ggml-org/ggml
Lines 343
Language C
Domain Tags ML_Infrastructure, DSP_Computing, Element_Wise_Operations
Status Active
Last Updated 2025-05-15 12:00 GMT
Knowledge Sources ggml-org/ggml repository

Overview

binary-ops.c is the DSP-side implementation of element-wise binary operations (multiply, add, subtract) on the Hexagon HVX vector processor. These operations are used throughout ML inference for residual connections, scaling, and masking.

Description

The file uses function pointer tables (func_table_HVX and func_table_HVX_opt) to dispatch to HVX-optimized variants. The optimized variants (hvx_mul_f32_aa, hvx_add_f32_aa, hvx_sub_f32_aa) are selected when both source and destination buffers are 128-byte aligned and row sizes are VLEN-aligned.

The binary_job_f32_per_thread function parallelizes work by distributing source rows across threads. It handles broadcasting by computing source row indices via modular arithmetic when source and destination dimensions differ.

The htp_binary_preamble macro extracts all tensor dimensions (ne00-ne13, ne0-ne3), strides (nb00-nb13, nb0-nb3), and per-thread row counts from the operation context.

Usage

These operations are dispatched from the DSP-side message loop when the host sends binary operation requests (GGML_OP_ADD, GGML_OP_MUL, GGML_OP_SUB).

Code Reference

Source Location

Repository File Lines
ggml-org/ggml src/ggml-hexagon/htp/binary-ops.c 343

Key Signatures

typedef void (*hvx_elemwise_f32_func)(uint8_t * data_dst, const uint8_t * src0,
    const uint8_t * src1, const uint32_t num_elems);

static hvx_elemwise_f32_func func_table_HVX[]     = { hvx_mul_f32, hvx_add_f32, hvx_sub_f32 };
static hvx_elemwise_f32_func func_table_HVX_opt[] = { hvx_mul_f32_aa, hvx_add_f32_aa, hvx_sub_f32_aa };

static void binary_job_f32_per_thread(struct htp_ops_context * octx,
    uint8_t * spad_data, uint32_t nth, uint32_t ith, enum htp_op op);

I/O Contract

Inputs

  • octx -- Operation context containing source tensors, destination tensor, and per-thread row counts
  • spad_data -- Scratchpad memory for intermediate storage
  • nth, ith -- Total thread count and current thread index
  • op -- Binary operation type (multiply, add, subtract)

Outputs

  • dst tensor -- Result of element-wise binary operation written to destination buffer

Usage Examples

Internal dispatch:

// Called from main.c op_binary() dispatch
binary_job_f32_per_thread(octx, spad_data, nth, ith, HTP_OP_MUL);
binary_job_f32_per_thread(octx, spad_data, nth, ith, HTP_OP_ADD);
binary_job_f32_per_thread(octx, spad_data, nth, ith, HTP_OP_SUB);

Related Pages

Implements Principle

Related Implementations

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment