Implementation:Interpretml Interpret DataSetBoosting Hpp
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, EBM_Core |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Defines the DataSubsetBoosting and DataSetBoosting structures that hold training and validation data subsets for the EBM boosting process.
Description
The DataSetBoosting.hpp header declares two key data structures. DataSubsetBoosting holds a single SIMD-aligned data subset containing sample count, objective wrapper pointer, gradient/hessian arrays, sample scores, target data, per-term packed feature data, and inner bag weights. It provides methods for applying objective updates (ObjectiveApplyUpdate) and computing bin sums (BinSumsBoosting) by delegating to the objective wrapper's function pointers. DataSetBoosting is the top-level container that holds multiple DataSubsetBoosting instances (one per SIMD compute zone), total sample count, subset count, total weight, inner bag data, and term inner bag data. Both structures maintain POD status and use only malloc/free for memory management.
Usage
Created during boosting initialization to hold the training and validation datasets. The BoosterCore holds one DataSetBoosting for training and one for validation. During each boosting round, the data subsets are iterated over to compute gradient histograms and apply score updates.
Code Reference
Source Location
- Repository: Interpretml_Interpret
- File: shared/libebm/DataSetBoosting.hpp
Signature
struct DataSubsetBoosting final {
inline void SafeInitDataSubsetBoosting();
void DestructDataSubsetBoosting(const size_t cTerms, const size_t cInnerBags);
inline size_t GetCountSamples() const;
inline const ObjectiveWrapper* GetObjectiveWrapper() const;
inline ErrorEbm ObjectiveApplyUpdate(ApplyUpdateBridge* const pData);
inline ErrorEbm BinSumsBoosting(BinSumsBoostingBridge* const pParams);
inline void* GetGradHess();
inline void* GetSampleScores();
inline const void* GetTargetData() const;
inline const void* GetTermData(const size_t iTerm) const;
inline const SubsetInnerBag* GetSubsetInnerBag(const size_t iBag) const;
};
struct DataSetBoosting final {
inline void SafeInitDataSetBoosting();
void DestructDataSetBoosting(const size_t cTerms, const size_t cInnerBags);
inline size_t GetCountSamples() const;
inline size_t GetCountSubsets() const;
inline DataSubsetBoosting* GetSubsets();
inline double GetWeightTotal() const;
};
I/O Contract
| Component | Description |
|---|---|
| DataSubsetBoosting.m_cSamples | Number of samples in this SIMD-aligned subset |
| DataSubsetBoosting.m_pObjective | Objective function pointer for this subset's compute zone |
| DataSubsetBoosting.m_aGradHess | Per-sample gradient/hessian storage |
| DataSubsetBoosting.m_aSampleScores | Per-sample current model scores |
| DataSetBoosting.m_cSubsets | Number of SIMD subsets (1 per compute zone) |
Usage Examples
# Called internally via native bindings
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y) # DataSetBoosting holds training/validation data internally