Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Interpretml Interpret DataSetInteraction

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, EBM_Core
Last Updated 2026-02-07 12:00 GMT

Overview

DataSetInteraction is a C++ module that manages dataset initialization and memory layout for the EBM interaction detection phase.

Description

This module implements the DataSetInteraction and DataSubsetInteraction classes which manage data required during interaction detection. Similar to DataSetBoosting, the dataset is divided into subsets to support both CPU and SIMD processing pipelines with different numeric precision levels.

Key responsibilities include:

  • Gradient/Hessian allocation (InitGradHess): Allocates aligned memory for gradient and hessian arrays per data subset.
  • Feature data initialization (InitFeatureData): Bit-packs individual feature bin indices from the shared dataset format into SIMD-aligned integer arrays. Unlike boosting (which packs multi-dimensional term tensor indices), interaction detection packs each feature independently since interaction terms are evaluated dynamically.
  • Weight initialization (InitWeights): Extracts sample weights from the shared dataset, computes total weight across all subsets, and validates against overflow.
  • Top-level initialization (InitDataSetInteraction): Orchestrates subset creation, assigns CPU vs SIMD objective wrappers based on subset size, and calls all sub-initialization routines.

The interaction dataset only includes training samples (positive bag values), as interaction detection operates solely on training data.

Usage

This module is instantiated during InteractionCore::Create at the start of interaction detection. It provides the data layout used by interaction scoring algorithms (PartitionMultiDimensionalStraight, etc.) to evaluate feature interaction strengths.

Code Reference

Source Location

Signature

ErrorEbm DataSetInteraction::InitDataSetInteraction(
    const bool bAllocateHessians,
    const size_t cScores,
    const size_t cSubsetItemsMax,
    const ObjectiveWrapper* const pObjectiveCpu,
    const ObjectiveWrapper* const pObjectiveSIMD,
    const unsigned char* const pDataSetShared,
    const size_t cSharedSamples,
    const BagEbm* const aBag,
    const size_t cIncludedSamples,
    const size_t cWeights,
    const size_t cFeatures);

void DataSetInteraction::DestructDataSetInteraction(
    const size_t cFeatures);

I/O Contract

Inputs

Name Type Required Description
bAllocateHessians bool Yes Whether to allocate hessian arrays
cScores size_t Yes Number of score outputs
cSubsetItemsMax size_t Yes Maximum samples per data subset
pObjectiveCpu const ObjectiveWrapper* Yes CPU objective function wrapper
pObjectiveSIMD const ObjectiveWrapper* Yes SIMD objective function wrapper
pDataSetShared const unsigned char* Yes Shared dataset binary blob
aBag const BagEbm* No Bag replication array
cIncludedSamples size_t Yes Number of training samples to include
cFeatures size_t Yes Number of features in the dataset

Outputs

Name Type Description
return value ErrorEbm Error code (Error_None on success)
DataSetInteraction members (internal) Initialized subsets with gradient, feature data, and weight arrays

Usage Examples

Pipeline Context

# This C++ module is called internally via the native bindings
# during interaction detection
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y)  # Internally creates DataSetInteraction for interaction scoring

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment