Implementation:Interpretml Interpret Avx2 32
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, EBM_Core |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Implements the AVX2 SIMD specialization for 32-bit floating-point EBM computations, processing 8 samples in parallel using 256-bit vector operations.
Description
The avx2_32.cpp file defines the Avx2_32_Float and Avx2_32_Int SIMD wrapper types that use AVX2 256-bit intrinsics (__m256 and __m256i) to process 8 float32 values simultaneously. It provides operator overloads for arithmetic (+, -, *, /), comparisons, bitwise operations, and special functions (Load, Store, FusedMultiplyAdd, Sqrt). The file also defines SIMD-accelerated Exp and Log functions. The alignment is set to 32 bytes (k_cAlignment) to match AVX2 requirements. This implementation is automatically selected at runtime when the CPU supports AVX2 instructions, providing a significant speedup over the scalar CPU implementation for gradient and hessian computations. The file includes all objective registrations via the objective_registrations.hpp header within its SIMD namespace.
Usage
Selected automatically at runtime when AVX2 CPU support is detected. Used for all per-sample gradient/hessian computations including BinSumsBoosting, BinSumsInteraction, and ApplyUpdate operations, processing 8 samples per SIMD operation.
Code Reference
Source Location
- Repository: Interpretml_Interpret
- File: shared/libebm/compute/avx2_ebm/avx2_32.cpp
Signature
static constexpr size_t k_cAlignment = 32;
struct alignas(k_cAlignment) Avx2_32_Int final {
using T = uint32_t;
using TPack = __m256i;
static constexpr AccelerationFlags k_zone = AccelerationFlags_AVX2;
static constexpr int k_cSIMDShift = 3;
static constexpr int k_cSIMDPack = 1 << k_cSIMDShift; // 8
inline static Avx2_32_Int Load(const T* const a) noexcept;
inline void Store(T* const a) const noexcept;
inline static Avx2_32_Int LoadBytes(const uint8_t* const a) noexcept;
template<typename TFunc> static inline void Execute(
const TFunc& func, const Avx2_32_Int& val0) noexcept;
inline static Avx2_32_Int MakeIndexes() noexcept;
};
struct alignas(k_cAlignment) Avx2_32_Float final { ... };
template<bool bNegateInput, bool bNaNPossible, bool bUnderflowPossible, bool bOverflowPossible>
inline Avx2_32_Float Exp(const Avx2_32_Float& val) noexcept;
template<bool bNegateOutput, bool bNaNPossible, bool bNegativePossible,
bool bZeroPossible, bool bPositiveInfinityPossible>
inline Avx2_32_Float Log(const Avx2_32_Float& val) noexcept;
I/O Contract
| Type | Description |
|---|---|
| Avx2_32_Float | 8x float32 SIMD wrapper using __m256 |
| Avx2_32_Int | 8x uint32 SIMD wrapper using __m256i |
| k_cSIMDPack | 8 (processes 8 samples per operation) |
| k_cAlignment | 32 bytes (AVX2 alignment requirement) |
Usage Examples
# Called internally via native bindings
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y) # Automatically uses AVX2 SIMD on supported CPUs