Implementation:Scikit learn Scikit learn KBinsDiscretizer
| Knowledge Sources | |
|---|---|
| Domains | Data Preprocessing, Feature Engineering |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for discretizing continuous features into bins provided by scikit-learn.
Description
KBinsDiscretizer bins continuous data into intervals. It supports three encoding strategies: one-hot, one-hot-dense, and ordinal. The binning strategy can be uniform (equal-width bins), quantile (equal-frequency bins), or kmeans (bins based on 1D k-means clustering).
Usage
Use KBinsDiscretizer when you need to convert continuous features into discrete or categorical features, which can be useful for models that work better with categorical inputs or when you want to introduce non-linearity into linear models.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/preprocessing/_discretization.py
Signature
class KBinsDiscretizer(TransformerMixin, BaseEstimator):
def __init__(
self,
n_bins=5,
*,
encode="onehot",
strategy="quantile",
quantile_method="warn",
dtype=None,
subsample=200_000,
random_state=None,
):
Import
from sklearn.preprocessing import KBinsDiscretizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_bins | int or array-like of shape (n_features,) | No | The number of bins to produce. Default is 5. Raises ValueError if n_bins < 2. |
| encode | str | No | Method used to encode the transformed result: 'onehot', 'onehot-dense', or 'ordinal'. Default is 'onehot'. |
| strategy | str | No | Strategy used to define the widths of the bins: 'uniform', 'quantile', or 'kmeans'. Default is 'quantile'. |
| quantile_method | str | No | Method passed to np.percentile when strategy='quantile'. Default is 'linear'. |
| dtype | np.float32 or np.float64 | No | The desired data-type for the output. Default is None (consistent with input). |
| subsample | int or None | No | Maximum number of samples used to fit the model. Default is 200,000. |
| random_state | int or RandomState | No | Random state for reproducibility when subsampling. |
Outputs
| Name | Type | Description |
|---|---|---|
| X_transformed | ndarray or sparse matrix | The discretized feature matrix, encoded according to the encode parameter. |
| bin_edges_ | ndarray of ndarray | The edges of each bin for each feature after fitting. |
| n_bins_ | ndarray of shape (n_features,) | Number of bins per feature, which may differ from n_bins if a feature has fewer unique values. |
Usage Examples
Basic Usage
from sklearn.preprocessing import KBinsDiscretizer
import numpy as np
X = np.array([[-2, 1, -4, -1],
[-1, 2, -3, -0.5],
[0, 3, -2, 0.5],
[1, 4, -1, 2]])
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
est.fit(X)
X_transformed = est.transform(X)
print(X_transformed)