Implementation:Ggml org Ggml Hexagon htp main
| File Name | src/ggml-hexagon/htp/main.c
|
| Repository | ggml-org/ggml |
| Lines | 1043 |
| Language | C |
| Domain Tags | DSP_Computing, Runtime_Dispatch, Hardware_Abstraction |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
main.c is the entry point for the DSP-side HTP skeleton, implementing the IDL interface (htp_iface_open/close) and the dspqueue message processing loop that dispatches tensor operations on the Hexagon processor. This is the DSP-side runtime core where host requests arrive and get executed on Hexagon hardware.
Description
The htp_iface_open function allocates an htp_context, configures DCVS power settings to maximum performance, powers on HVX and HMX coprocessors, and sets the client class to compute. The context is used as the remote handle.
The message processing loop reads htp_general_req messages from the dspqueue and dispatches them to operation functions based on the op code:
op_matmul-- Matrix multiplicationop_binary-- Element-wise add/mul/subop_softmax-- Softmax with optional ALiBiop_rope-- Rotary position embeddingop_act-- Activation functions (SwiGLU)op_flash_attn-- Flash attention
The runtime manages VTCM (Vector Tightly Coupled Memory) allocation, worker pool initialization, and DMA queue setup for high-performance data movement.
Usage
This code runs exclusively on the Hexagon DSP. It is loaded as a skeleton library by the FastRPC runtime when the host opens a session.
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-hexagon/htp/main.c |
1043 |
Key Signatures
// IDL interface entry points AEEResult htp_iface_open(const char * uri, remote_handle64 * handle); AEEResult htp_iface_close(remote_handle64 handle); // Power configuration -- sets DCVS to max performance HAP_power_request_t request; request.dcvs_v3.dcvs_option = HAP_DCVS_V2_PERFORMANCE_MODE; request.dcvs_v3.bus_params.min_corner = HAP_DCVS_VCORNER_MAX;
I/O Contract
Inputs
- dspqueue messages --
htp_general_reqcontaining operation type, tensor descriptors, and parameters - Shared memory buffers -- Tensor data in rpcmem-allocated regions
Outputs
- dspqueue responses --
htp_general_rspwith status, timing, and profiling data - Computed tensor data -- Results written to shared memory for host access
Usage Examples
DSP-side lifecycle:
// 1. Host opens the remote handle AEEResult err = htp_iface_open(uri, &handle); // -> Allocates context, powers on HVX/HMX, sets max performance // 2. Host sends operations via dspqueue // -> Message loop dispatches to op_matmul, op_binary, op_softmax, etc. // 3. Host closes the handle htp_iface_close(handle); // -> Frees context, releases resources
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Hexagon_backend -- Host-side counterpart
- Implementation:Ggml_org_Ggml_Hexagon_matmul_ops -- Matrix multiply operations
- Implementation:Ggml_org_Ggml_Hexagon_flash_attn -- Flash attention operations
- Implementation:Ggml_org_Ggml_Hexagon_softmax_ops -- Softmax operations
- Implementation:Ggml_org_Ggml_Hexagon_act_ops -- Activation operations
- Implementation:Ggml_org_Ggml_Hexagon_binary_ops -- Binary operations
- Implementation:Ggml_org_Ggml_Hexagon_rope_ops -- RoPE operations