Implementation:Ggml org Ggml Sycl dequantize

Knowledge Sources	GGML
Domains	ML_Infrastructure, GPU_Compute, Quantization
Last Updated	2025-05-15 12:00 GMT

Overview

Inline dequantization kernel functions for all quantization formats on SYCL, serving as the core building block for dmmv, mmvq, convert, and get_rows kernels.

Description

dequantize.hpp provides the lowest-level dequantization primitives for the SYCL backend. Each function takes a pointer to quantized data, a block index, and a quant sub-index, and outputs a dfloat2 pair of dequantized floating-point values. The header covers all GGML quantization formats:

Standard types: dequantize_q4_0, dequantize_q4_1, dequantize_q5_0, dequantize_q5_1, dequantize_q8_0
K-quant types: dequantize_q2_K, dequantize_q3_K, dequantize_q4_K, dequantize_q5_K, dequantize_q6_K
IQ types: Various importance-quantization format dequantizers
Reorder variants: dequantize_q4_0_reorder and similar functions that work with separated scale/quant memory layouts (Structure-of-Arrays), optimized for GPU memory access patterns

All functions are declared static __dpct_inline__ for maximum inlining at call sites. The file supports both F16 and F32 computation modes via the GGML_SYCL_F16 compile flag, using sycl::half or float arithmetic respectively.

The dequantize_kernel_t and dequantize_kernel_t_reorder function pointer typedefs allow these kernels to be passed as template parameters to higher-level operations.

Usage

Included by convert.cpp, dmmv.cpp, cpy.cpp, and other kernel files that need to convert quantized blocks to floating-point values. The dequantization functions are typically called inside parallel_for kernels, with each work-item dequantizing a pair of values.

Code Reference

Source Location

Repository: GGML
File: src/ggml-sycl/dequantize.hpp
Lines: 841

Signatures

// Function pointer types
typedef void (*dequantize_kernel_t)(const void * vx, const int64_t ib, const int iqs, dfloat2 & v);
typedef void (*dequantize_kernel_t_reorder)(const void *d, const int64_t ib, const void *qs,
                                            const int iqs, dfloat2 &v);

// Standard dequantization (representative examples)
static __dpct_inline__ void dequantize_q4_0(const void *vx, const int64_t ib,
                                            const int iqs, dfloat2 &v);
static __dpct_inline__ void dequantize_q4_1(const void *vx, const int64_t ib,
                                            const int iqs, dfloat2 &v);
static __dpct_inline__ void dequantize_q8_0(const void *vx, const int64_t ib,
                                            const int iqs, dfloat2 &v);

// Reorder-aware variant
static __dpct_inline__ void dequantize_q4_0_reorder(const void *d_ptr, const int64_t ib,
                                            const void *qs, const int iqs, dfloat2 &v);

I/O Contract

Inputs

Name	Type	Required	Description
vx	const void *	Yes	Pointer to quantized block data
ib	int64_t	Yes	Block index within the quantized tensor
iqs	int	Yes	Quant sub-index within the block

Outputs

Name	Type	Description
v	dfloat2 &	Pair of dequantized values (v.x() and v.y())

Usage Examples

// Inside a SYCL kernel, dequantize two values from a q4_0 block:
dfloat2 v;
dequantize_q4_0(quantized_data, block_index, quant_index, v);
float val0 = v.x();  // first dequantized value
float val1 = v.y();  // second dequantized value

// Using as a template parameter for generic dequantization:
dequantize_block<QK4_0, QR4_0, dequantize_q4_0>(vx, y, k, item_ct1);

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment