Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Hexagon act ops

From Leeroopedia


Implementation Metadata
File Name src/ggml-hexagon/htp/act-ops.c
Repository ggml-org/ggml
Lines 675
Language C
Domain Tags ML_Infrastructure, DSP_Computing, Activation_Functions
Status Active
Last Updated 2025-05-15 12:00 GMT
Knowledge Sources ggml-org/ggml repository

Overview

act-ops.c is the DSP-side implementation of activation function operations (SwiGLU, SwiGLU-OAI variants) running on the Hexagon HVX vector processor. It provides hardware-accelerated activation functions critical for transformer model inference, as SwiGLU is used in LLaMA-family models.

Description

The file defines preamble macros (htp_act_preamble2, htp_act_preamble3) to extract tensor dimensions and strides into local variables. These macros handle both two-source and three-source tensor configurations.

Per-thread activation functions (e.g., glu_swiglu_f32_per_thread) process tensor rows in parallel using HVX intrinsics for SIMD computation. Each thread processes a slice of rows determined by nth (total threads) and ith (thread index). The implementation uses scratchpad memory (htp_spad) and DMA for efficient data movement between main memory and the HVX processing units.

Usage

These operations are dispatched from the DSP-side message loop in main.c when the host sends activation operation requests. They are not called directly by application code.

Code Reference

Source Location

Repository File Lines
ggml-org/ggml src/ggml-hexagon/htp/act-ops.c 675

Key Signatures

// Preamble macros for tensor dimension extraction
#define htp_act_preamble3  // Extracts ne00-ne03, ne10-ne13, ne0-ne3, nb00-nb03, nb10-nb13, nb0-nb3
#define htp_act_preamble2  // Extracts ne00-ne03, ne0-ne3, nb00-nb03, nb0-nb3

// Per-thread activation function
static void glu_swiglu_f32_per_thread(
    const struct htp_tensor * src0,
    const struct htp_tensor * src1,
    struct htp_tensor *       dst,
    const int32_t *           op_params,
    struct htp_spad *         src0_spad,
    struct htp_spad *         src1_spad,
    struct htp_spad *         dst_spad,
    uint32_t                  nth,
    uint32_t                  ith,
    uint32_t                  src0_nrows_per_thread,
    dma_queue *               dma_queue);

I/O Contract

Inputs

  • src0 -- First source tensor (gating input), as htp_tensor
  • src1 -- Second source tensor (value input), as htp_tensor
  • op_params -- Operation parameters as int32 array
  • Scratchpad memory -- htp_spad buffers for DMA staging

Outputs

  • dst -- Destination tensor with activation applied, written via DMA

Usage Examples

Internal dispatch from the HTP message loop:

// Called by op_act() in main.c when processing GGML_OP_GLU_SWIGLU
glu_swiglu_f32_per_thread(src0, src1, dst, op_params,
    src0_spad, src1_spad, dst_spad,
    nth, ith, nrows_per_thread, dma_q);

Related Pages

Implements Principle

Related Implementations

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment