Implementation:Interpretml Interpret Avx2 32

Knowledge Sources	Interpretml Interpret
Domains	Machine_Learning, EBM_Core
Last Updated	2026-02-07 12:00 GMT

Overview

Implements the AVX2 SIMD specialization for 32-bit floating-point EBM computations, processing 8 samples in parallel using 256-bit vector operations.

Description

The avx2_32.cpp file defines the Avx2_32_Float and Avx2_32_Int SIMD wrapper types that use AVX2 256-bit intrinsics (__m256 and __m256i) to process 8 float32 values simultaneously. It provides operator overloads for arithmetic (+, -, *, /), comparisons, bitwise operations, and special functions (Load, Store, FusedMultiplyAdd, Sqrt). The file also defines SIMD-accelerated Exp and Log functions. The alignment is set to 32 bytes (k_cAlignment) to match AVX2 requirements. This implementation is automatically selected at runtime when the CPU supports AVX2 instructions, providing a significant speedup over the scalar CPU implementation for gradient and hessian computations. The file includes all objective registrations via the objective_registrations.hpp header within its SIMD namespace.

Usage

Selected automatically at runtime when AVX2 CPU support is detected. Used for all per-sample gradient/hessian computations including BinSumsBoosting, BinSumsInteraction, and ApplyUpdate operations, processing 8 samples per SIMD operation.

Code Reference

Source Location

Repository: Interpretml_Interpret
File: shared/libebm/compute/avx2_ebm/avx2_32.cpp

Signature

static constexpr size_t k_cAlignment = 32;

struct alignas(k_cAlignment) Avx2_32_Int final {
   using T = uint32_t;
   using TPack = __m256i;
   static constexpr AccelerationFlags k_zone = AccelerationFlags_AVX2;
   static constexpr int k_cSIMDShift = 3;
   static constexpr int k_cSIMDPack = 1 << k_cSIMDShift; // 8

   inline static Avx2_32_Int Load(const T* const a) noexcept;
   inline void Store(T* const a) const noexcept;
   inline static Avx2_32_Int LoadBytes(const uint8_t* const a) noexcept;
   template<typename TFunc> static inline void Execute(
       const TFunc& func, const Avx2_32_Int& val0) noexcept;
   inline static Avx2_32_Int MakeIndexes() noexcept;
};

struct alignas(k_cAlignment) Avx2_32_Float final { ... };

template<bool bNegateInput, bool bNaNPossible, bool bUnderflowPossible, bool bOverflowPossible>
inline Avx2_32_Float Exp(const Avx2_32_Float& val) noexcept;

template<bool bNegateOutput, bool bNaNPossible, bool bNegativePossible,
    bool bZeroPossible, bool bPositiveInfinityPossible>
inline Avx2_32_Float Log(const Avx2_32_Float& val) noexcept;

I/O Contract

Type	Description
Avx2_32_Float	8x float32 SIMD wrapper using __m256
Avx2_32_Int	8x uint32 SIMD wrapper using __m256i
k_cSIMDPack	8 (processes 8 samples per operation)
k_cAlignment	32 bytes (AVX2 alignment requirement)

Usage Examples

# Called internally via native bindings
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y)  # Automatically uses AVX2 SIMD on supported CPUs

Related Pages

Environment:Interpretml_Interpret_Native_Libebm_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment