Implementation:Ggml org Ggml Hexagon softmax ops

**Implementation Metadata**
File Name	`src/ggml-hexagon/htp/softmax-ops.c`
Repository	ggml-org/ggml
Lines	395
Language	C
Domain Tags	ML_Infrastructure, DSP_Computing, Normalization
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

softmax-ops.c is the DSP-side implementation of the softmax operation on the Hexagon HVX vector processor, with support for attention bias (ALiBi) and optional FP16 output. Softmax is critical in transformer attention layers for normalizing attention scores.

Description

The file uses a softmax_th_ctx struct to hold scale, max_bias, head count, and precomputed ALiBi slopes (m0, m1). The init_softmax_ctx function extracts parameters from op_params and computes n_head_log2 for ALiBi power-of-2 slope computation.

The htp_softmax_preamble3 macro handles both src0 and optional src1 (mask/bias) tensors with null-safe dimension extraction (defaults to 1 when src1 is absent).

The vectorized softmax implementation hvx_fast_softmax_prep_f32 performs:

Scaled input preparation with optional mask and slope
Row-wise max computation
Exponentiation (vectorized exp)
Sum reduction
Normalization (divide by sum)

Multi-threaded execution distributes rows across HVX threads.

Usage

Dispatched from the DSP-side message loop for GGML_OP_SOFT_MAX operations.

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-hexagon/htp/softmax-ops.c`	395

Key Signatures

struct softmax_th_ctx {
    bool     use_f16;
    bool     use_src1;
    uint32_t n_head;
    uint32_t n_head_log2;
    float    scale;
    float    max_bias;
    float    m0;
    float    m1;
    struct htp_ops_context * octx;
};

static void init_softmax_ctx(struct softmax_th_ctx * softmax_ctx, struct htp_ops_context * octx);

static void hvx_fast_softmax_prep_f32(const uint8_t * restrict src, uint8_t * restrict dst,
    const int num_elems, float scale, const uint8_t * restrict mask, float slope);

I/O Contract

Inputs

src0 -- Input logits tensor
src1 -- Optional mask/bias tensor (may be FP16 or FP32)
op_params -- Contains scale and max_bias parameters

Outputs

dst -- Normalized softmax output (probabilities summing to 1.0 per row)

Usage Examples

Softmax with ALiBi support:

// Initialize softmax context with ALiBi slope computation
init_softmax_ctx(&ctx, octx);
// ctx.m0 = pow(2.0, -max_bias / n_head_log2)
// ctx.m1 = pow(2.0, -(max_bias/2.0) / n_head_log2)

// Perform vectorized softmax: scale -> mask -> exp -> sum -> normalize
hvx_fast_softmax_prep_f32(src, dst, num_elems, scale, mask, slope);

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_Hexagon_DSP_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Hexagon_flash_attn -- Flash attention uses softmax
Implementation:Ggml_org_Ggml_Hexagon_htp_main -- Message dispatcher
Implementation:Ggml_org_Ggml_Hexagon_hvx_arith -- HVX arithmetic primitives

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment