Implementation:Ggml org Ggml Hexagon softmax ops
| File Name | src/ggml-hexagon/htp/softmax-ops.c
|
| Repository | ggml-org/ggml |
| Lines | 395 |
| Language | C |
| Domain Tags | ML_Infrastructure, DSP_Computing, Normalization |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
softmax-ops.c is the DSP-side implementation of the softmax operation on the Hexagon HVX vector processor, with support for attention bias (ALiBi) and optional FP16 output. Softmax is critical in transformer attention layers for normalizing attention scores.
Description
The file uses a softmax_th_ctx struct to hold scale, max_bias, head count, and precomputed ALiBi slopes (m0, m1). The init_softmax_ctx function extracts parameters from op_params and computes n_head_log2 for ALiBi power-of-2 slope computation.
The htp_softmax_preamble3 macro handles both src0 and optional src1 (mask/bias) tensors with null-safe dimension extraction (defaults to 1 when src1 is absent).
The vectorized softmax implementation hvx_fast_softmax_prep_f32 performs:
- Scaled input preparation with optional mask and slope
- Row-wise max computation
- Exponentiation (vectorized exp)
- Sum reduction
- Normalization (divide by sum)
Multi-threaded execution distributes rows across HVX threads.
Usage
Dispatched from the DSP-side message loop for GGML_OP_SOFT_MAX operations.
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-hexagon/htp/softmax-ops.c |
395 |
Key Signatures
struct softmax_th_ctx {
bool use_f16;
bool use_src1;
uint32_t n_head;
uint32_t n_head_log2;
float scale;
float max_bias;
float m0;
float m1;
struct htp_ops_context * octx;
};
static void init_softmax_ctx(struct softmax_th_ctx * softmax_ctx, struct htp_ops_context * octx);
static void hvx_fast_softmax_prep_f32(const uint8_t * restrict src, uint8_t * restrict dst,
const int num_elems, float scale, const uint8_t * restrict mask, float slope);
I/O Contract
Inputs
- src0 -- Input logits tensor
- src1 -- Optional mask/bias tensor (may be FP16 or FP32)
- op_params -- Contains scale and max_bias parameters
Outputs
- dst -- Normalized softmax output (probabilities summing to 1.0 per row)
Usage Examples
Softmax with ALiBi support:
// Initialize softmax context with ALiBi slope computation init_softmax_ctx(&ctx, octx); // ctx.m0 = pow(2.0, -max_bias / n_head_log2) // ctx.m1 = pow(2.0, -(max_bias/2.0) / n_head_log2) // Perform vectorized softmax: scale -> mask -> exp -> sum -> normalize hvx_fast_softmax_prep_f32(src, dst, num_elems, scale, mask, slope);
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Hexagon_flash_attn -- Flash attention uses softmax
- Implementation:Ggml_org_Ggml_Hexagon_htp_main -- Message dispatcher
- Implementation:Ggml_org_Ggml_Hexagon_hvx_arith -- HVX arithmetic primitives