Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Interpretml Interpret DataSetBoosting Hpp

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, EBM_Core
Last Updated 2026-02-07 12:00 GMT

Overview

Defines the DataSubsetBoosting and DataSetBoosting structures that hold training and validation data subsets for the EBM boosting process.

Description

The DataSetBoosting.hpp header declares two key data structures. DataSubsetBoosting holds a single SIMD-aligned data subset containing sample count, objective wrapper pointer, gradient/hessian arrays, sample scores, target data, per-term packed feature data, and inner bag weights. It provides methods for applying objective updates (ObjectiveApplyUpdate) and computing bin sums (BinSumsBoosting) by delegating to the objective wrapper's function pointers. DataSetBoosting is the top-level container that holds multiple DataSubsetBoosting instances (one per SIMD compute zone), total sample count, subset count, total weight, inner bag data, and term inner bag data. Both structures maintain POD status and use only malloc/free for memory management.

Usage

Created during boosting initialization to hold the training and validation datasets. The BoosterCore holds one DataSetBoosting for training and one for validation. During each boosting round, the data subsets are iterated over to compute gradient histograms and apply score updates.

Code Reference

Source Location

Signature

struct DataSubsetBoosting final {
   inline void SafeInitDataSubsetBoosting();
   void DestructDataSubsetBoosting(const size_t cTerms, const size_t cInnerBags);
   inline size_t GetCountSamples() const;
   inline const ObjectiveWrapper* GetObjectiveWrapper() const;
   inline ErrorEbm ObjectiveApplyUpdate(ApplyUpdateBridge* const pData);
   inline ErrorEbm BinSumsBoosting(BinSumsBoostingBridge* const pParams);
   inline void* GetGradHess();
   inline void* GetSampleScores();
   inline const void* GetTargetData() const;
   inline const void* GetTermData(const size_t iTerm) const;
   inline const SubsetInnerBag* GetSubsetInnerBag(const size_t iBag) const;
};

struct DataSetBoosting final {
   inline void SafeInitDataSetBoosting();
   void DestructDataSetBoosting(const size_t cTerms, const size_t cInnerBags);
   inline size_t GetCountSamples() const;
   inline size_t GetCountSubsets() const;
   inline DataSubsetBoosting* GetSubsets();
   inline double GetWeightTotal() const;
};

I/O Contract

Component Description
DataSubsetBoosting.m_cSamples Number of samples in this SIMD-aligned subset
DataSubsetBoosting.m_pObjective Objective function pointer for this subset's compute zone
DataSubsetBoosting.m_aGradHess Per-sample gradient/hessian storage
DataSubsetBoosting.m_aSampleScores Per-sample current model scores
DataSetBoosting.m_cSubsets Number of SIMD subsets (1 per compute zone)

Usage Examples

# Called internally via native bindings
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y)  # DataSetBoosting holds training/validation data internally

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment