Implementation:Ggml org Ggml Hexagon hvx arith

**Implementation Metadata**
File Name	`src/ggml-hexagon/htp/hvx-arith.h`
Repository	ggml-org/ggml
Lines	457
Language	C
Domain Tags	DSP_Computing, SIMD_Intrinsics, Arithmetic
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

hvx-arith.h provides HVX-intrinsic implementations of element-wise binary arithmetic operations (add, subtract, multiply) on FP32 vectors, with alignment-aware variants. This is the core arithmetic building block used by binary-ops.c and other operations. The alignment-aware design maximizes HVX throughput by using faster aligned loads/stores when possible.

Description

The file defines a generic hvx_arith_loop_body macro that processes HVX vectors in a loop with #pragma unroll(4), handling full vectors and leftover elements. Architecture-conditional macros select between two instruction paths:

HVX arch < 79 -- Uses qfloat intermediate operations (Q6_Vqf32_vadd_VsfVsf + Q6_Vsf_equals_Vqf32)
HVX arch >= 79 -- Uses native FP operations (Q6_Vsf_vadd_VsfVsf)

For each binary operation (ADD, SUB, MUL), four alignment variants are provided:

_aa -- Both source and destination aligned (fastest)
_au -- Destination aligned, source1 unaligned
_ua -- Destination unaligned, source aligned
_uu -- Both unaligned (slowest, but always correct)

A generic dispatcher (e.g., hvx_add_f32) selects the appropriate variant based on runtime alignment checks.

Usage

Included as a header by operation files that need element-wise arithmetic.

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-hexagon/htp/hvx-arith.h`	457

Key Signatures

// Architecture-conditional operation macros
#if __HVX_ARCH__ < 79
#define HVX_OP_ADD(a, b) Q6_Vsf_equals_Vqf32(Q6_Vqf32_vadd_VsfVsf(a, b))
#define HVX_OP_SUB(a, b) Q6_Vsf_equals_Vqf32(Q6_Vqf32_vsub_VsfVsf(a, b))
#define HVX_OP_MUL(a, b) Q6_Vsf_equals_Vqf32(Q6_Vqf32_vmpy_VsfVsf(a, b))
#else
#define HVX_OP_ADD(a, b) Q6_Vsf_vadd_VsfVsf(a, b)
#define HVX_OP_SUB(a, b) Q6_Vsf_vsub_VsfVsf(a, b)
#define HVX_OP_MUL(a, b) Q6_Vsf_vmpy_VsfVsf(a, b)
#endif

// Alignment variants for ADD
static inline void hvx_add_f32_aa(uint8_t * restrict dst, const uint8_t * restrict src0,
    const uint8_t * restrict src1, uint32_t n);
static inline void hvx_add_f32_au(uint8_t * restrict dst, const uint8_t * restrict src0,
    const uint8_t * restrict src1, uint32_t n);
static inline void hvx_add_f32_ua(...);
static inline void hvx_add_f32_uu(...);

// Generic dispatcher with runtime alignment detection
static inline void hvx_add_f32(uint8_t * dst, const uint8_t * src0, const uint8_t * src1, uint32_t n);

I/O Contract

Inputs

dst -- Destination buffer for result
src0, src1 -- Source operand buffers (FP32 elements)
n -- Number of FP32 elements to process

Outputs

dst -- Element-wise result (add, sub, or mul) written to destination buffer

Usage Examples

Used by binary-ops.c:

// Select optimized path based on alignment
hvx_elemwise_f32_func func = is_aligned ? func_table_HVX_opt[op] : func_table_HVX[op];

// Execute element-wise operation
func(dst_ptr, src0_ptr, src1_ptr, num_elements);

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_Hexagon_DSP_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Hexagon_binary_ops -- Primary consumer of these arithmetic functions
Implementation:Ggml_org_Ggml_Hexagon_softmax_ops -- Also uses HVX arithmetic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment