Implementation:Vllm project Vllm SGL Vec
| Knowledge Sources | |
|---|---|
| Domains | SIMD Vectorization, Data Type Conversion |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Defines SGLang vector type conversion utilities including FP8, FP16, and BF16 to FP32 converters using AVX512 intrinsics for CPU-optimized kernel operations.
Description
This header provides low-level SIMD conversion routines critical for quantized inference on CPU. It includes convert_from_float_ext for efficient float-to-reduced-precision conversion using native AVX512-BF16 instructions, multiple FP8 (E4M3) to BF16 conversion variants (cvt_e4m3_bf16_intrinsic_no_nan, cvt_e4m3_bf16_intrinsic_with_denorm, cvt_e4m3_bf16_intrinsic_without_denorm), and scalar reduction functions (vec_reduce_sum, vec_reduce_max). It also provides quantize_row_int8 for dynamic per-row INT8 quantization used in w8a8 MoE kernels.
Usage
This header is included by the SGL-kernels MoE implementations (moe_fp8.cpp and moe_int8.cpp). It is compiled when building the vLLM CPU extension with AVX512 support enabled.
Code Reference
Source Location
- Repository: vllm
- File: csrc/cpu/sgl-kernels/vec.h
- Lines: 1-308
Signature
// Float to reduced precision conversion (with AVX512-BF16 specialization)
template <typename scalar_t>
inline Vectorized<scalar_t> convert_from_float_ext(
const Vectorized<float>& a, const Vectorized<float>& b);
// FP8 E4M3 to BF16 conversion (multiple variants)
inline __m512bh cvt_e4m3_bf16_intrinsic_no_nan(__m256i fp8_vec);
inline __m512bh cvt_e4m3_bf16_intrinsic_without_denorm(__m256i fp8_vec);
inline __m512bh cvt_e4m3_bf16_intrinsic_with_denorm(__m256i fp8_vec);
inline __m512bh CVT_FP8_TO_BF16(__m256i a);
// Scalar reduction functions
inline float vec_reduce_sum(const Vectorized<float>& a);
inline float vec_reduce_max(const Vectorized<float>& a);
// Dynamic INT8 row quantization
template <typename scalar_t>
inline void quantize_row_int8(uint8_t* __restrict__ Aq, float& As,
const scalar_t* __restrict__ A, int64_t K, float eps = 1e-7);
Import
#include "vec.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fp8_vec | __m256i | Yes | 32 packed FP8 E4M3 values for conversion to BF16 |
| a, b | Vectorized<float> | Yes | Two float vectors to convert to reduced precision scalar_t |
| A | const scalar_t* | Yes | Input row data for INT8 quantization |
| K | int64_t | Yes | Number of elements in the row to quantize |
Outputs
| Name | Type | Description |
|---|---|---|
| (return) | __m512bh | 32 BF16 values converted from FP8 input |
| (return) | Vectorized<scalar_t> | Reduced precision vector converted from float pairs |
| Aq | uint8_t* | Quantized INT8 output row |
| As | float& | Computed quantization scale for the row |
Usage Examples
// Convert FP8 weights to BF16 for GEMM computation
__m256i fp8_data = _mm256_loadu_si256((__m256i*)fp8_ptr);
__m512bh bf16_data = CVT_FP8_TO_BF16(fp8_data);
// Convert float pair back to BFloat16
Vectorized<float> f0, f1;
auto bf16_vec = convert_from_float_ext<at::BFloat16>(f0, f1);
// Quantize a row to INT8 with dynamic scaling
float scale;
quantize_row_int8<at::BFloat16>(quant_buf, scale, input_row, K);