Principle:Scikit learn Scikit learn Feature Encoding
| Knowledge Sources | |
|---|---|
| Domains | Feature Engineering, Data Preprocessing |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Feature encoding transforms raw data features into numerical representations suitable for machine learning algorithms, including conversions of categorical variables, generation of interaction terms, and discretization of continuous values.
Description
Most machine learning algorithms operate on numerical input matrices, requiring that categorical variables, text labels, and other non-numeric data be converted into numerical form. Feature encoding also encompasses generating polynomial interaction features from existing numeric variables, discretizing continuous features into bins, and applying arbitrary transformations. These techniques solve the problem of bridging the gap between raw, heterogeneous data and the numeric input requirements of learning algorithms. Feature encoding is a critical step in the data preprocessing and feature engineering pipeline.
Usage
Use label encoding when converting ordinal categorical variables into integers where the ordering is meaningful. Use one-hot encoding (label binarization) when categorical variables are nominal (no inherent order) and the model cannot handle ordinal integers. Use polynomial features to generate interaction terms and higher-order terms, enabling linear models to capture non-linear relationships. Use target encoding when you want to encode categorical features using the mean of the target variable, leveraging the relationship between feature and target. Use binning (discretization) to convert continuous features into categorical ones, which can help with non-linear relationships and reduce sensitivity to outliers. Use function transformers to apply arbitrary element-wise transformations (e.g., log, square root) to features.
Theoretical Basis
Label Encoding maps each category to a unique integer:
where is the number of unique categories. This creates an implicit ordering, which is appropriate only for ordinal variables.
One-Hot Encoding (Label Binarization) converts a categorical variable with levels into binary indicator variables:
This avoids imposing an artificial ordinal relationship between categories.
Polynomial Features generates all polynomial combinations of features up to degree . For features and degree 2:
The number of output features is for input features and degree , including the interaction-only option which excludes powers higher than 1 for individual features.
Target Encoding replaces each category with a statistic of the target variable for that category, typically the mean:
Smoothing is applied to prevent overfitting on rare categories:
where depends on the number of observations for category .
K-Bins Discretization partitions a continuous feature into bins using strategies such as:
- Uniform: Equal-width bins
- Quantile: Equal-frequency bins
- K-Means: Bins based on 1D K-Means clustering
Function Transformer applies a user-specified function element-wise to feature values, e.g., .