Principle:Scikit learn Scikit learn Feature Encoding

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Feature Engineering, Data Preprocessing
Last Updated	2026-02-08 15:00 GMT

Overview

Feature encoding transforms raw data features into numerical representations suitable for machine learning algorithms, including conversions of categorical variables, generation of interaction terms, and discretization of continuous values.

Description

Most machine learning algorithms operate on numerical input matrices, requiring that categorical variables, text labels, and other non-numeric data be converted into numerical form. Feature encoding also encompasses generating polynomial interaction features from existing numeric variables, discretizing continuous features into bins, and applying arbitrary transformations. These techniques solve the problem of bridging the gap between raw, heterogeneous data and the numeric input requirements of learning algorithms. Feature encoding is a critical step in the data preprocessing and feature engineering pipeline.

Usage

Use label encoding when converting ordinal categorical variables into integers where the ordering is meaningful. Use one-hot encoding (label binarization) when categorical variables are nominal (no inherent order) and the model cannot handle ordinal integers. Use polynomial features to generate interaction terms and higher-order terms, enabling linear models to capture non-linear relationships. Use target encoding when you want to encode categorical features using the mean of the target variable, leveraging the relationship between feature and target. Use binning (discretization) to convert continuous features into categorical ones, which can help with non-linear relationships and reduce sensitivity to outliers. Use function transformers to apply arbitrary element-wise transformations (e.g., log, square root) to features.

Theoretical Basis

Label Encoding maps each category to a unique integer:

$c \mapsto i, i \in {0, 1, \dots, K - 1}$

where $K$ is the number of unique categories. This creates an implicit ordering, which is appropriate only for ordinal variables.

One-Hot Encoding (Label Binarization) converts a categorical variable with $K$ levels into $K$ binary indicator variables:

$c_{k} \mapsto [𝟏 (c = c_{1}), 𝟏 (c = c_{2}), \dots, 𝟏 (c = c_{K})]$

This avoids imposing an artificial ordinal relationship between categories.

Polynomial Features generates all polynomial combinations of features up to degree $d$ . For features $[x_{1}, x_{2}]$ and degree 2:

$[x_{1}, x_{2}] \mapsto [1, x_{1}, x_{2}, x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}]$

The number of output features is $(\binom{n + d}{d})$ for $n$ input features and degree $d$ , including the interaction-only option which excludes powers higher than 1 for individual features.

Target Encoding replaces each category with a statistic of the target variable for that category, typically the mean:

$encode (c) = \frac{\sum_{i : x_{i} = c} y_{i}}{| {i : x_{i} = c} |}$

Smoothing is applied to prevent overfitting on rare categories:

$encode (c) = λ \cdot {\bar{y}}_{c} + (1 - λ) \cdot {\bar{y}}_{global}$

where $λ$ depends on the number of observations for category $c$ .

K-Bins Discretization partitions a continuous feature into $K$ bins using strategies such as:

Uniform: Equal-width bins
Quantile: Equal-frequency bins
K-Means: Bins based on 1D K-Means clustering

Function Transformer applies a user-specified function element-wise to feature values, e.g., $x \mapsto \log (1 + x)$ .

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment