Implementation:Ggml org Ggml Hexagon act ops

**Implementation Metadata**
File Name	`src/ggml-hexagon/htp/act-ops.c`
Repository	ggml-org/ggml
Lines	675
Language	C
Domain Tags	ML_Infrastructure, DSP_Computing, Activation_Functions
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

act-ops.c is the DSP-side implementation of activation function operations (SwiGLU, SwiGLU-OAI variants) running on the Hexagon HVX vector processor. It provides hardware-accelerated activation functions critical for transformer model inference, as SwiGLU is used in LLaMA-family models.

Description

The file defines preamble macros (htp_act_preamble2, htp_act_preamble3) to extract tensor dimensions and strides into local variables. These macros handle both two-source and three-source tensor configurations.

Per-thread activation functions (e.g., glu_swiglu_f32_per_thread) process tensor rows in parallel using HVX intrinsics for SIMD computation. Each thread processes a slice of rows determined by nth (total threads) and ith (thread index). The implementation uses scratchpad memory (htp_spad) and DMA for efficient data movement between main memory and the HVX processing units.

Usage

These operations are dispatched from the DSP-side message loop in main.c when the host sends activation operation requests. They are not called directly by application code.

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-hexagon/htp/act-ops.c`	675

Key Signatures

// Preamble macros for tensor dimension extraction
#define htp_act_preamble3  // Extracts ne00-ne03, ne10-ne13, ne0-ne3, nb00-nb03, nb10-nb13, nb0-nb3
#define htp_act_preamble2  // Extracts ne00-ne03, ne0-ne3, nb00-nb03, nb0-nb3

// Per-thread activation function
static void glu_swiglu_f32_per_thread(
    const struct htp_tensor * src0,
    const struct htp_tensor * src1,
    struct htp_tensor *       dst,
    const int32_t *           op_params,
    struct htp_spad *         src0_spad,
    struct htp_spad *         src1_spad,
    struct htp_spad *         dst_spad,
    uint32_t                  nth,
    uint32_t                  ith,
    uint32_t                  src0_nrows_per_thread,
    dma_queue *               dma_queue);

I/O Contract

Inputs

src0 -- First source tensor (gating input), as htp_tensor
src1 -- Second source tensor (value input), as htp_tensor
op_params -- Operation parameters as int32 array
Scratchpad memory -- htp_spad buffers for DMA staging

Outputs

dst -- Destination tensor with activation applied, written via DMA

Usage Examples

Internal dispatch from the HTP message loop:

// Called by op_act() in main.c when processing GGML_OP_GLU_SWIGLU
glu_swiglu_f32_per_thread(src0, src1, dst, op_params,
    src0_spad, src1_spad, dst_spad,
    nth, ith, nrows_per_thread, dma_q);

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_Hexagon_DSP_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Hexagon_htp_main -- Message dispatcher calling these operations
Implementation:Ggml_org_Ggml_Hexagon_hvx_arith -- HVX arithmetic primitives
Implementation:Ggml_org_Ggml_Hexagon_backend -- Host-side backend

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment