Implementation:Ggml org Ggml Sycl dequantize
| Knowledge Sources | |
|---|---|
| Domains | ML_Infrastructure, GPU_Compute, Quantization |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Inline dequantization kernel functions for all quantization formats on SYCL, serving as the core building block for dmmv, mmvq, convert, and get_rows kernels.
Description
dequantize.hpp provides the lowest-level dequantization primitives for the SYCL backend. Each function takes a pointer to quantized data, a block index, and a quant sub-index, and outputs a dfloat2 pair of dequantized floating-point values. The header covers all GGML quantization formats:
- Standard types: dequantize_q4_0, dequantize_q4_1, dequantize_q5_0, dequantize_q5_1, dequantize_q8_0
- K-quant types: dequantize_q2_K, dequantize_q3_K, dequantize_q4_K, dequantize_q5_K, dequantize_q6_K
- IQ types: Various importance-quantization format dequantizers
- Reorder variants: dequantize_q4_0_reorder and similar functions that work with separated scale/quant memory layouts (Structure-of-Arrays), optimized for GPU memory access patterns
All functions are declared static __dpct_inline__ for maximum inlining at call sites. The file supports both F16 and F32 computation modes via the GGML_SYCL_F16 compile flag, using sycl::half or float arithmetic respectively.
The dequantize_kernel_t and dequantize_kernel_t_reorder function pointer typedefs allow these kernels to be passed as template parameters to higher-level operations.
Usage
Included by convert.cpp, dmmv.cpp, cpy.cpp, and other kernel files that need to convert quantized blocks to floating-point values. The dequantization functions are typically called inside parallel_for kernels, with each work-item dequantizing a pair of values.
Code Reference
Source Location
- Repository: GGML
- File: src/ggml-sycl/dequantize.hpp
- Lines: 841
Signatures
// Function pointer types
typedef void (*dequantize_kernel_t)(const void * vx, const int64_t ib, const int iqs, dfloat2 & v);
typedef void (*dequantize_kernel_t_reorder)(const void *d, const int64_t ib, const void *qs,
const int iqs, dfloat2 &v);
// Standard dequantization (representative examples)
static __dpct_inline__ void dequantize_q4_0(const void *vx, const int64_t ib,
const int iqs, dfloat2 &v);
static __dpct_inline__ void dequantize_q4_1(const void *vx, const int64_t ib,
const int iqs, dfloat2 &v);
static __dpct_inline__ void dequantize_q8_0(const void *vx, const int64_t ib,
const int iqs, dfloat2 &v);
// Reorder-aware variant
static __dpct_inline__ void dequantize_q4_0_reorder(const void *d_ptr, const int64_t ib,
const void *qs, const int iqs, dfloat2 &v);
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vx | const void * | Yes | Pointer to quantized block data |
| ib | int64_t | Yes | Block index within the quantized tensor |
| iqs | int | Yes | Quant sub-index within the block |
Outputs
| Name | Type | Description |
|---|---|---|
| v | dfloat2 & | Pair of dequantized values (v.x() and v.y()) |
Usage Examples
// Inside a SYCL kernel, dequantize two values from a q4_0 block:
dfloat2 v;
dequantize_q4_0(quantized_data, block_index, quant_index, v);
float val0 = v.x(); // first dequantized value
float val1 = v.y(); // second dequantized value
// Using as a template parameter for generic dequantization:
dequantize_block<QK4_0, QR4_0, dequantize_q4_0>(vx, y, k, item_ct1);