Implementation:Interpretml Interpret Gradient Histogram Bin
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, EBM_Core |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Defines the templated Bin data structure used to accumulate gradient and hessian statistics during the EBM boosting and interaction detection processes.
Description
The Bin.hpp header provides the core histogram bin data structures for the EBM native library. It defines a hierarchy of bin types: BinBase as the non-templated base, BinData for the actual data storage (with template specializations for different combinations of sample count, weight, and hessian tracking), and Bin as the fully specialized type used throughout the codebase. The bins are POD (Plain Old Data) types that use only malloc/free for memory management, and the struct hack pattern (via ArrayToPointer) for co-located gradient pair storage. Bins accumulate per-feature statistics including sample counts, weights, and gradient/hessian pairs needed for tree-based split finding.
Usage
Used internally during boosting and interaction detection to accumulate per-bin gradient and hessian statistics. BinSumsBoosting and BinSumsInteraction populate these bins, and the partition algorithms (e.g., PartitionMultiDimensionalFull) read from them to compute gain and find optimal splits.
Code Reference
Source Location
- Repository: Interpretml_Interpret
- File: shared/libebm/bridge/Bin.hpp
Signature
template<typename TFloat, typename TUInt, bool bCount, bool bWeight, bool bHessian, size_t cCompilerScores = 1>
struct Bin;
struct BinBase {
template<typename TFloat, typename TUInt, bool bCount, bool bWeight, bool bHessian, size_t cCompilerScores = 1>
GPU_BOTH inline Bin<TFloat, TUInt, bCount, bWeight, bHessian, cCompilerScores>* Specialize();
GPU_BOTH inline void ZeroMem(const size_t cBytesPerBin, const size_t cBins = 1, const size_t iBin = 0);
};
template<typename TFloat, typename TUInt, bool bHessian, size_t cCompilerScores>
struct BinData<TFloat, TUInt, true, true, bHessian, cCompilerScores> : BinBase {
GPU_BOTH inline TUInt GetCountSamples() const;
GPU_BOTH inline void SetCountSamples(const TUInt cSamples);
GPU_BOTH inline TFloat GetWeight() const;
GPU_BOTH inline void SetWeight(const TFloat weight);
};
template<typename TFloat, typename TUInt>
static bool IsOverflowBinSize(const bool bCount, const bool bWeight, const bool bHessian, const size_t cScores);
template<typename TFloat, typename TUInt>
GPU_BOTH inline constexpr static size_t GetBinSize(
const bool bCount, const bool bWeight, const bool bHessian, const size_t cScores);
I/O Contract
| Template Parameter | Description |
|---|---|
| TFloat | Floating point type (float or double, or SIMD pack) |
| TUInt | Unsigned integer type for sample counts |
| bCount | Whether to track sample counts per bin |
| bWeight | Whether to track weight per bin |
| bHessian | Whether to track hessians in addition to gradients |
| cCompilerScores | Number of scores (1 for binary, k for multiclass) |
| Output | Description |
|---|---|
| GradientPairs | Accumulated gradient (and optionally hessian) per score |
| CountSamples | Number of samples in bin (when bCount is true) |
| Weight | Accumulated weight (when bWeight is true) |
Usage Examples
# Called internally via native bindings
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y) # Internally bins are used during boosting rounds