Implementation:Ggml org Ggml Hexagon hvx arith
| File Name | src/ggml-hexagon/htp/hvx-arith.h
|
| Repository | ggml-org/ggml |
| Lines | 457 |
| Language | C |
| Domain Tags | DSP_Computing, SIMD_Intrinsics, Arithmetic |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
hvx-arith.h provides HVX-intrinsic implementations of element-wise binary arithmetic operations (add, subtract, multiply) on FP32 vectors, with alignment-aware variants. This is the core arithmetic building block used by binary-ops.c and other operations. The alignment-aware design maximizes HVX throughput by using faster aligned loads/stores when possible.
Description
The file defines a generic hvx_arith_loop_body macro that processes HVX vectors in a loop with #pragma unroll(4), handling full vectors and leftover elements. Architecture-conditional macros select between two instruction paths:
- HVX arch < 79 -- Uses qfloat intermediate operations (
Q6_Vqf32_vadd_VsfVsf+Q6_Vsf_equals_Vqf32) - HVX arch >= 79 -- Uses native FP operations (
Q6_Vsf_vadd_VsfVsf)
For each binary operation (ADD, SUB, MUL), four alignment variants are provided:
_aa-- Both source and destination aligned (fastest)_au-- Destination aligned, source1 unaligned_ua-- Destination unaligned, source aligned_uu-- Both unaligned (slowest, but always correct)
A generic dispatcher (e.g., hvx_add_f32) selects the appropriate variant based on runtime alignment checks.
Usage
Included as a header by operation files that need element-wise arithmetic.
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-hexagon/htp/hvx-arith.h |
457 |
Key Signatures
// Architecture-conditional operation macros
#if __HVX_ARCH__ < 79
#define HVX_OP_ADD(a, b) Q6_Vsf_equals_Vqf32(Q6_Vqf32_vadd_VsfVsf(a, b))
#define HVX_OP_SUB(a, b) Q6_Vsf_equals_Vqf32(Q6_Vqf32_vsub_VsfVsf(a, b))
#define HVX_OP_MUL(a, b) Q6_Vsf_equals_Vqf32(Q6_Vqf32_vmpy_VsfVsf(a, b))
#else
#define HVX_OP_ADD(a, b) Q6_Vsf_vadd_VsfVsf(a, b)
#define HVX_OP_SUB(a, b) Q6_Vsf_vsub_VsfVsf(a, b)
#define HVX_OP_MUL(a, b) Q6_Vsf_vmpy_VsfVsf(a, b)
#endif
// Alignment variants for ADD
static inline void hvx_add_f32_aa(uint8_t * restrict dst, const uint8_t * restrict src0,
const uint8_t * restrict src1, uint32_t n);
static inline void hvx_add_f32_au(uint8_t * restrict dst, const uint8_t * restrict src0,
const uint8_t * restrict src1, uint32_t n);
static inline void hvx_add_f32_ua(...);
static inline void hvx_add_f32_uu(...);
// Generic dispatcher with runtime alignment detection
static inline void hvx_add_f32(uint8_t * dst, const uint8_t * src0, const uint8_t * src1, uint32_t n);
I/O Contract
Inputs
- dst -- Destination buffer for result
- src0, src1 -- Source operand buffers (FP32 elements)
- n -- Number of FP32 elements to process
Outputs
- dst -- Element-wise result (add, sub, or mul) written to destination buffer
Usage Examples
Used by binary-ops.c:
// Select optimized path based on alignment hvx_elemwise_f32_func func = is_aligned ? func_table_HVX_opt[op] : func_table_HVX[op]; // Execute element-wise operation func(dst_ptr, src0_ptr, src1_ptr, num_elements);
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Hexagon_binary_ops -- Primary consumer of these arithmetic functions
- Implementation:Ggml_org_Ggml_Hexagon_softmax_ops -- Also uses HVX arithmetic