Workflow:Interpretml Interpret EBM Training And Prediction

Knowledge Sources	InterpretML InterpretML Docs InterpretML Paper GA2M Paper
Domains	Machine_Learning, Interpretability, Generalized_Additive_Models
Last Updated	2026-02-07 12:00 GMT

Overview

End-to-end process for training an Explainable Boosting Machine (EBM) on tabular data and generating predictions using the InterpretML library.

Description

This workflow covers the complete pipeline for building an interpretable machine learning model using the Explainable Boosting Machine, the flagship model of InterpretML. EBMs are a modern implementation of Generalized Additive Models with pairwise interactions (GA2Ms) that combine gradient boosting, bagging, and automatic interaction detection to achieve accuracy competitive with blackbox models while remaining fully interpretable. The process begins with raw tabular data and produces a fitted model capable of both classification and regression, with exact per-feature contribution scores for every prediction.

Key outputs:

A fitted EBM model with learned feature shape functions (term scores)
Per-feature contribution scores for each prediction
Standard deviations across bags for confidence estimation

Scope:

Covers data cleaning, feature binning, boosting, and prediction
Supports classification (binary and multiclass) and regression
Includes differential privacy variants (DP-EBM)

Strategy:

Uses quantile-based binning to discretize continuous features
Applies cyclic gradient boosting with one feature at a time
Employs bagging (outer bags) for variance reduction and confidence intervals
Automatically detects and includes pairwise feature interactions

Usage

Execute this workflow when you have a tabular dataset (structured data with rows and columns) and need to train a machine learning model that provides exact, auditable explanations for every prediction. This is particularly suited for regulated industries (healthcare, finance, insurance) where model transparency is required, or when domain experts need to review and potentially edit model behavior. The EBM handles mixed feature types (continuous, categorical, and string data) natively without requiring manual preprocessing.

Execution Steps

Step 1: Data Preparation and Validation

Clean and validate the input training data. The framework accepts pandas DataFrames, numpy arrays, and handles string/categorical data natively. Input features (X) and target variable (y) are validated for consistency. Optional sample weights and initialization scores can be provided for custom weighting or transfer learning scenarios.

Key considerations:

The framework auto-detects feature types (continuous vs categorical) from the data
Missing values are handled natively through a dedicated missing bin
For classification, target labels are automatically mapped to class indices
Sample weights allow emphasizing certain observations during training

Step 2: Feature Binning and Discretization

Transform raw feature values into discrete bins suitable for the boosting algorithm. Continuous features are binned using quantile-based cuts (by default) to create roughly equal-frequency bins. Categorical features are mapped to integer bin indices. This step produces a hierarchical bin structure: finer bins for main effects and coarser bins for interaction terms to manage computational complexity.

What happens:

Quantile cuts are computed for continuous features respecting minimum samples per bin
Categorical values are enumerated and mapped to bin indices
Bin weights (sample counts per bin) are recorded for importance calculations
Feature bounds (min/max observed values) are stored for later visualization
Histogram data is computed for density plots in explanations
For DP-EBM: noise is added to bin boundaries using the privacy budget

Step 3: Interaction Detection

Automatically identify pairs of features that have significant interaction effects. The framework uses a FAST (Functional ANOVA Screening Technique) algorithm to rank candidate feature pairs by their interaction strength, then selects the top interactions to include as additional terms in the model.

Key considerations:

Interaction detection runs on the native C++ engine for performance
The number of interactions is controlled by the max_interaction_bins and interactions parameters
Each selected interaction creates a two-dimensional lookup table (tensor) of scores
Users can also manually specify interactions to include or exclude

Step 4: Bagged Gradient Boosting

Train the model using an ensemble of boosting iterations across multiple bags (bootstrap samples). Each outer bag creates an independent training/validation split. Within each bag, the boosting algorithm cycles through individual features, fitting one-dimensional trees to the residuals for each feature in turn. This round-robin approach ensures the model learns additive contributions that can be attributed to individual features.

What happens:

Multiple outer bags are created (default 8 for classification, 8 for regression)
Each bag gets a random train/validation split
The boosting loop cycles through all terms (features and interactions)
For each term, a one-dimensional tree is fit to the current residuals
Term updates are applied with a learning rate to prevent overfitting
A greedy phase selects high-gain terms for additional boosting rounds
Early stopping monitors validation loss to prevent overfitting
Smoothing rounds add random splits to improve generalization
For DP-EBM: Gaussian noise is added to each term update based on the privacy budget

Step 5: Model Aggregation and Postprocessing

Aggregate results from all outer bags to produce the final model. Term scores (shape functions) from each bag are averaged to produce stable estimates. Standard deviations across bags provide confidence intervals. The intercept (base prediction) is computed as the mean prediction.

Key considerations:

Term scores from all bags are averaged for the final model
Standard deviations across bags quantify model uncertainty
Score tensors are purified using functional ANOVA decomposition to ensure identifiability
For multiclass: separate score tensors are maintained for each class
The model stores all bag-level results for later merge operations

Step 6: Prediction

Generate predictions for new data by evaluating the learned shape functions. Each feature value is looked up in its corresponding term score table, and contributions from all terms are summed together with the intercept. The resulting raw score is passed through a link function to produce the final prediction.

What happens:

New data is binned using the same bin cuts learned during training
Each feature value maps to a bin, which maps to a score contribution
All term contributions are summed: prediction = intercept + sum(term_scores)
For classification: logistic link function converts scores to probabilities
For regression: identity link returns scores directly
The predict_proba method returns class probabilities for classifiers
The predict method returns the most likely class or regression value

Execution Diagram

GitHub URL

Workflow Repository