Principle:Interpretml Interpret Feature Binning And Discretization
| Metadata | |
|---|---|
| Sources | InterpretML, EBM Binning |
| Domains | Data_Preprocessing, Interpretability |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
A discretization technique that converts continuous and categorical features into fixed bin indices for use in additive model training.
Description
Feature Binning and Discretization partitions the feature space into discrete bins, enabling EBMs to learn piecewise-constant shape functions. For continuous features, quantile-based (or uniform/humanized) cut points divide the range into approximately equal-frequency bins. For categorical features, each unique category maps to a bin index. The binning process also computes bin weights (sample counts per bin), handles missing values with a dedicated bin, and supports differential privacy through noise injection during bin boundary selection.
Usage
Use this principle after data preparation and before model training. It is essential for any GAM-based model that learns lookup-table style shape functions rather than parametric functions.
Theoretical Basis
Quantile binning: given N samples sorted, place cut points at quantile boundaries q_i = i/k for k bins.
For feature x with sorted values x(1) ≤ x(2) ≤ ... ≤ x(N), cut points ci are chosen such that approximately N/k samples fall in each bin [ci-1, ci).
For categorical features, each unique value v maps to bin index b(v).
Missing values always get bin index 0 (dedicated missing bin).