Principle:Scikit learn Scikit learn Feature Selection
| Knowledge Sources | |
|---|---|
| Domains | Feature Engineering, Model Selection |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Feature selection identifies and retains the most relevant features from a dataset while discarding redundant or irrelevant ones, improving model performance and interpretability.
Description
Feature selection reduces the dimensionality of the input space by selecting a subset of the original features rather than transforming them (as in dimensionality reduction). It addresses overfitting, reduces computational cost, and improves model interpretability. Feature selection methods fall into three categories: filter methods that score features independently of the model, wrapper methods that evaluate feature subsets using a specific model's performance, and embedded methods that perform selection as part of the model training process. Feature selection is a critical component of the feature engineering pipeline, especially when dealing with high-dimensional datasets.
Usage
Use filter methods (SelectKBest, VarianceThreshold) for fast, model-agnostic feature screening as a preprocessing step. Use wrapper methods (RFE, SequentialFeatureSelector) when you want to optimize feature subsets specifically for a given estimator and can afford the additional computational cost. Use embedded methods (SelectFromModel with L1-regularized models or tree-based feature importances) when feature selection should be integrated with model training. Use mutual information-based scoring when features have non-linear relationships with the target. Use variance threshold as a simple baseline to remove constant or near-constant features.
Theoretical Basis
Filter Methods score each feature independently using a statistical test:
Variance Threshold: Remove features with variance below a threshold:
SelectKBest: Select the features with the highest scores according to a scoring function:
- ANOVA F-value (for classification):
- Chi-squared test: for non-negative features
- Mutual information:
Mutual information captures arbitrary (non-linear) dependencies between features and the target, unlike correlation-based measures.
Wrapper Methods search for optimal feature subsets by evaluating model performance:
Recursive Feature Elimination (RFE):
- Train the model on all features.
- Rank features by importance (e.g., coefficient magnitude, feature importance).
- Remove the least important feature(s).
- Repeat until the desired number of features is reached.
Sequential Feature Selector (SFS):
- Forward selection: Start with no features; iteratively add the feature that most improves cross-validated performance.
- Backward elimination: Start with all features; iteratively remove the feature whose removal least degrades performance.
Embedded Methods perform selection during training:
SelectFromModel uses an estimator's learned feature importances or coefficients to select features above a threshold:
For L1-regularized models, many coefficients are exactly zero, providing natural feature selection. For tree-based models, feature importance is typically measured by the total reduction in impurity contributed by each feature.
Related Pages
- Implementation:Scikit_learn_Scikit_learn_SelectFromModel
- Implementation:Scikit_learn_Scikit_learn_RFE
- Implementation:Scikit_learn_Scikit_learn_SequentialFeatureSelector
- Implementation:Scikit_learn_Scikit_learn_SelectKBest
- Implementation:Scikit_learn_Scikit_learn_MutualInfoClassif
- Implementation:Scikit_learn_Scikit_learn_SelectorMixin
- Implementation:Scikit_learn_Scikit_learn_VarianceThreshold