Principle:Interpretml Interpret Feature Binning And Discretization

Metadata
Sources	InterpretML, EBM Binning
Domains	Data_Preprocessing, Interpretability
Last Updated	2026-02-07 12:00 GMT

Overview

A discretization technique that converts continuous and categorical features into fixed bin indices for use in additive model training.

Description

Feature Binning and Discretization partitions the feature space into discrete bins, enabling EBMs to learn piecewise-constant shape functions. For continuous features, quantile-based (or uniform/humanized) cut points divide the range into approximately equal-frequency bins. For categorical features, each unique category maps to a bin index. The binning process also computes bin weights (sample counts per bin), handles missing values with a dedicated bin, and supports differential privacy through noise injection during bin boundary selection.

Usage

Use this principle after data preparation and before model training. It is essential for any GAM-based model that learns lookup-table style shape functions rather than parametric functions.

Theoretical Basis

Quantile binning: given N samples sorted, place cut points at quantile boundaries q_i = i/k for k bins.

For feature x with sorted values x₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x_(N), cut points c_i are chosen such that approximately N/k samples fall in each bin [c_i-1, c_i).

$c_{i} = x_{(⌊ i \cdot N / k ⌋)}$

For categorical features, each unique value v maps to bin index b(v).

Missing values always get bin index 0 (dedicated missing bin).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment