Implementation:Interpretml Interpret PartitionMultiDimensionalTree
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, EBM_Core |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
PartitionMultiDimensionalTree is a C++ module that builds a multi-dimensional decision tree over interaction terms during EBM boosting, enabling recursive tree-based splits across multiple dimensions.
Description
This module implements a tree-based partitioning algorithm for multi-dimensional interaction terms. Unlike the corner-based or straight-based algorithms, this approach builds an actual decision tree over the multi-dimensional bin space using TreeNodeMulti nodes, allowing splits on any dimension at each tree node.
Key components include:
MakeTensor: Traverses the completed tree to construct the boosting update tensor. It walks the tree nodes, collects split points per dimension, and builds the full tensor with predictions for each leaf region. It supports optional purification viaPurifyInternalto decompose the multi-dimensional tensor update into interpretable components. When purification is enabled, it also outputs tensor weights, gradients, and hessians for the purification algorithm.
PartitionMultiDimensionalTreeInternal::Func: The main tree-building function that performs a best-first (priority-queue-based) greedy tree construction:- Starts with a root node containing all bins
- Uses
TensorTotalsSumto evaluate potential splits along each dimension - Pushes promising splits into a priority queue ordered by gain
- Pops the best split and creates child nodes
- Continues until the maximum number of leaves is reached or no beneficial splits remain
- Respects regularization parameters (alpha, lambda), minimum leaf size, minimum hessian, and maximum step size constraints
The tree nodes use TreeNodeMulti which stores both the split dimension index and split position, allowing non-uniform splits across dimensions.
Usage
This module is called during boosting when processing multi-dimensional interaction terms with the tree-based splitting strategy. It is the most flexible multi-dimensional partitioner, suitable for capturing complex interaction patterns.
Code Reference
Source Location
- Repository: Interpretml_Interpret
- File:
shared/libebm/PartitionMultiDimensionalTree.cpp
Signature
template<bool bHessian, size_t cCompilerScores, size_t cCompilerDimensions>
static ErrorEbm MakeTensor(
const size_t cRuntimeScores,
const size_t cRuntimeRealDimensions,
const TermBoostFlags flags,
const Bin<FloatMain, UIntMain, true, true, bHessian, GetArrayScores(cCompilerScores)>* const aBins,
const FloatCalc regAlpha,
const FloatCalc regLambda,
const FloatCalc deltaStepMax,
double* const aTensorWeights,
double* const aTensorGrad,
double* const aTensorHess,
const size_t cPossibleSplits,
unsigned char** const aaSplits,
TreeNodeMulti<bHessian, GetArrayScores(cCompilerScores)>* const pRootTreeNode,
const size_t* const aiOriginalIndex,
TensorSumDimension* const aDimensions,
Bin<...>* const aAuxiliaryBins,
Tensor* const pInnerTermUpdate
...
);
template<bool bHessian, size_t cCompilerScores>
class PartitionMultiDimensionalTreeInternal final {
public:
static ErrorEbm Func(
RandomDeterministic* const pRng,
const size_t cRuntimeScores,
const size_t cRealDimensions,
const TermBoostFlags flags,
const IntEbm* const aLeavesMax,
const size_t cSamplesLeafMin,
const FloatCalc hessianMin,
const FloatCalc regAlpha,
const FloatCalc regLambda,
const FloatCalc deltaStepMax,
const BinBase* const aBinsBase,
BinBase* const aAuxiliaryBinsBase,
Tensor* const pInnerTermUpdate,
const size_t* const acBins,
double* const aTensorWeights,
double* const aTensorGrad,
double* const aTensorHess,
double* const pTotalGain
...
);
};
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pRng | RandomDeterministic* | No | Random number generator for purification |
| cRuntimeScores | size_t | Yes | Number of score outputs |
| cRealDimensions | size_t | Yes | Number of dimensions with more than 1 bin |
| flags | TermBoostFlags | Yes | Boosting flags (DisableNewtonGain, DisableNewtonUpdate) |
| aLeavesMax | const IntEbm* | Yes | Maximum number of leaves per dimension |
| cSamplesLeafMin | size_t | Yes | Minimum samples per leaf |
| hessianMin | FloatCalc | Yes | Minimum hessian sum per leaf |
| regAlpha | FloatCalc | Yes | L1 regularization parameter |
| regLambda | FloatCalc | Yes | L2 regularization parameter |
| deltaStepMax | FloatCalc | Yes | Maximum update step size |
| aBinsBase | const BinBase* | Yes | Pre-computed histogram bins |
| acBins | const size_t* | Yes | Bin counts per dimension |
Outputs
| Name | Type | Description |
|---|---|---|
| return value | ErrorEbm | Error code (Error_None on success) |
| pInnerTermUpdate | Tensor* | Updated tensor containing tree-based split scores |
| aTensorWeights | double* | Tensor weights (for purification) |
| aTensorGrad | double* | Tensor gradients (for purification) |
| aTensorHess | double* | Tensor hessians (for purification) |
| pTotalGain | double* | Total gain achieved by the tree splits |
Usage Examples
Pipeline Context
# This C++ module is called internally via the native bindings
# during boosting of multi-dimensional interaction terms
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier(interactions=10)
ebm.fit(X, y) # Internally calls PartitionMultiDimensionalTree for pair boosting