Implementation:Interpretml Interpret Avx512f 32
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, EBM_Core |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Implements the AVX-512F SIMD specialization for 32-bit floating-point EBM computations, processing 16 samples in parallel using 512-bit vector operations.
Description
The avx512f_32.cpp file defines the Avx512f_32_Float and Avx512f_32_Int SIMD wrapper types that use AVX-512F 512-bit intrinsics (__m512 and __m512i) to process 16 float32 values simultaneously. It follows the same architecture as the AVX2 implementation but doubles the throughput by using wider registers. The alignment is 64 bytes (k_cAlignment) to match AVX-512 requirements. The Avx512f_32_Int has k_cSIMDShift=4, giving k_cSIMDPack=16. The file provides the same operator overloads and special functions as the AVX2 variant, with AVX-512-specific intrinsics for masked operations, gather/scatter, and conditional execution. This is the highest-performance compute path available in the library.
Usage
Selected automatically at runtime when AVX-512F CPU support is detected. Provides the fastest available SIMD path, processing 16 float32 samples per operation for gradient/hessian computations and score updates.
Code Reference
Source Location
- Repository: Interpretml_Interpret
- File: shared/libebm/compute/avx512f_ebm/avx512f_32.cpp
Signature
static constexpr size_t k_cAlignment = 64;
struct alignas(k_cAlignment) Avx512f_32_Int final {
using T = uint32_t;
using TPack = __m512i;
static constexpr AccelerationFlags k_zone = AccelerationFlags_AVX512F;
static constexpr int k_cSIMDShift = 4;
static constexpr int k_cSIMDPack = 1 << k_cSIMDShift; // 16
inline static Avx512f_32_Int Load(const T* const a) noexcept;
inline void Store(T* const a) const noexcept;
inline static Avx512f_32_Int LoadBytes(const uint8_t* const a) noexcept;
template<typename TFunc> static inline void Execute(
const TFunc& func, const Avx512f_32_Int& val0) noexcept;
inline static Avx512f_32_Int MakeIndexes() noexcept;
};
struct alignas(k_cAlignment) Avx512f_32_Float final { ... };
template<bool bNegateInput, bool bNaNPossible, bool bUnderflowPossible, bool bOverflowPossible>
inline Avx512f_32_Float Exp(const Avx512f_32_Float& val) noexcept;
template<bool bNegateOutput, bool bNaNPossible, bool bNegativePossible,
bool bZeroPossible, bool bPositiveInfinityPossible>
inline Avx512f_32_Float Log(const Avx512f_32_Float& val) noexcept;
I/O Contract
| Type | Description |
|---|---|
| Avx512f_32_Float | 16x float32 SIMD wrapper using __m512 |
| Avx512f_32_Int | 16x uint32 SIMD wrapper using __m512i |
| k_cSIMDPack | 16 (processes 16 samples per operation) |
| k_cAlignment | 64 bytes (AVX-512 alignment requirement) |
Usage Examples
# Called internally via native bindings
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y) # Automatically uses AVX-512 SIMD on supported CPUs