Implementation:Interpretml Interpret Avx512f 32

Knowledge Sources	Interpretml Interpret
Domains	Machine_Learning, EBM_Core
Last Updated	2026-02-07 12:00 GMT

Overview

Implements the AVX-512F SIMD specialization for 32-bit floating-point EBM computations, processing 16 samples in parallel using 512-bit vector operations.

Description

The avx512f_32.cpp file defines the Avx512f_32_Float and Avx512f_32_Int SIMD wrapper types that use AVX-512F 512-bit intrinsics (__m512 and __m512i) to process 16 float32 values simultaneously. It follows the same architecture as the AVX2 implementation but doubles the throughput by using wider registers. The alignment is 64 bytes (k_cAlignment) to match AVX-512 requirements. The Avx512f_32_Int has k_cSIMDShift=4, giving k_cSIMDPack=16. The file provides the same operator overloads and special functions as the AVX2 variant, with AVX-512-specific intrinsics for masked operations, gather/scatter, and conditional execution. This is the highest-performance compute path available in the library.

Usage

Selected automatically at runtime when AVX-512F CPU support is detected. Provides the fastest available SIMD path, processing 16 float32 samples per operation for gradient/hessian computations and score updates.

Code Reference

Source Location

Repository: Interpretml_Interpret
File: shared/libebm/compute/avx512f_ebm/avx512f_32.cpp

Signature

static constexpr size_t k_cAlignment = 64;

struct alignas(k_cAlignment) Avx512f_32_Int final {
   using T = uint32_t;
   using TPack = __m512i;
   static constexpr AccelerationFlags k_zone = AccelerationFlags_AVX512F;
   static constexpr int k_cSIMDShift = 4;
   static constexpr int k_cSIMDPack = 1 << k_cSIMDShift; // 16

   inline static Avx512f_32_Int Load(const T* const a) noexcept;
   inline void Store(T* const a) const noexcept;
   inline static Avx512f_32_Int LoadBytes(const uint8_t* const a) noexcept;
   template<typename TFunc> static inline void Execute(
       const TFunc& func, const Avx512f_32_Int& val0) noexcept;
   inline static Avx512f_32_Int MakeIndexes() noexcept;
};

struct alignas(k_cAlignment) Avx512f_32_Float final { ... };

template<bool bNegateInput, bool bNaNPossible, bool bUnderflowPossible, bool bOverflowPossible>
inline Avx512f_32_Float Exp(const Avx512f_32_Float& val) noexcept;

template<bool bNegateOutput, bool bNaNPossible, bool bNegativePossible,
    bool bZeroPossible, bool bPositiveInfinityPossible>
inline Avx512f_32_Float Log(const Avx512f_32_Float& val) noexcept;

I/O Contract

Type	Description
Avx512f_32_Float	16x float32 SIMD wrapper using __m512
Avx512f_32_Int	16x uint32 SIMD wrapper using __m512i
k_cSIMDPack	16 (processes 16 samples per operation)
k_cAlignment	64 bytes (AVX-512 alignment requirement)

Usage Examples

# Called internally via native bindings
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y)  # Automatically uses AVX-512 SIMD on supported CPUs

Related Pages

Environment:Interpretml_Interpret_Native_Libebm_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment