Implementation:Ggml org Ggml Hexagon act ops
| File Name | src/ggml-hexagon/htp/act-ops.c
|
| Repository | ggml-org/ggml |
| Lines | 675 |
| Language | C |
| Domain Tags | ML_Infrastructure, DSP_Computing, Activation_Functions |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
act-ops.c is the DSP-side implementation of activation function operations (SwiGLU, SwiGLU-OAI variants) running on the Hexagon HVX vector processor. It provides hardware-accelerated activation functions critical for transformer model inference, as SwiGLU is used in LLaMA-family models.
Description
The file defines preamble macros (htp_act_preamble2, htp_act_preamble3) to extract tensor dimensions and strides into local variables. These macros handle both two-source and three-source tensor configurations.
Per-thread activation functions (e.g., glu_swiglu_f32_per_thread) process tensor rows in parallel using HVX intrinsics for SIMD computation. Each thread processes a slice of rows determined by nth (total threads) and ith (thread index). The implementation uses scratchpad memory (htp_spad) and DMA for efficient data movement between main memory and the HVX processing units.
Usage
These operations are dispatched from the DSP-side message loop in main.c when the host sends activation operation requests. They are not called directly by application code.
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-hexagon/htp/act-ops.c |
675 |
Key Signatures
// Preamble macros for tensor dimension extraction
#define htp_act_preamble3 // Extracts ne00-ne03, ne10-ne13, ne0-ne3, nb00-nb03, nb10-nb13, nb0-nb3
#define htp_act_preamble2 // Extracts ne00-ne03, ne0-ne3, nb00-nb03, nb0-nb3
// Per-thread activation function
static void glu_swiglu_f32_per_thread(
const struct htp_tensor * src0,
const struct htp_tensor * src1,
struct htp_tensor * dst,
const int32_t * op_params,
struct htp_spad * src0_spad,
struct htp_spad * src1_spad,
struct htp_spad * dst_spad,
uint32_t nth,
uint32_t ith,
uint32_t src0_nrows_per_thread,
dma_queue * dma_queue);
I/O Contract
Inputs
- src0 -- First source tensor (gating input), as
htp_tensor - src1 -- Second source tensor (value input), as
htp_tensor - op_params -- Operation parameters as int32 array
- Scratchpad memory --
htp_spadbuffers for DMA staging
Outputs
- dst -- Destination tensor with activation applied, written via DMA
Usage Examples
Internal dispatch from the HTP message loop:
// Called by op_act() in main.c when processing GGML_OP_GLU_SWIGLU
glu_swiglu_f32_per_thread(src0, src1, dst, op_params,
src0_spad, src1_spad, dst_spad,
nth, ith, nrows_per_thread, dma_q);
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Hexagon_htp_main -- Message dispatcher calling these operations
- Implementation:Ggml_org_Ggml_Hexagon_hvx_arith -- HVX arithmetic primitives
- Implementation:Ggml_org_Ggml_Hexagon_backend -- Host-side backend