Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn Scikit learn Feature Selection

From Leeroopedia


Knowledge Sources
Domains Feature Engineering, Model Selection
Last Updated 2026-02-08 15:00 GMT

Overview

Feature selection identifies and retains the most relevant features from a dataset while discarding redundant or irrelevant ones, improving model performance and interpretability.

Description

Feature selection reduces the dimensionality of the input space by selecting a subset of the original features rather than transforming them (as in dimensionality reduction). It addresses overfitting, reduces computational cost, and improves model interpretability. Feature selection methods fall into three categories: filter methods that score features independently of the model, wrapper methods that evaluate feature subsets using a specific model's performance, and embedded methods that perform selection as part of the model training process. Feature selection is a critical component of the feature engineering pipeline, especially when dealing with high-dimensional datasets.

Usage

Use filter methods (SelectKBest, VarianceThreshold) for fast, model-agnostic feature screening as a preprocessing step. Use wrapper methods (RFE, SequentialFeatureSelector) when you want to optimize feature subsets specifically for a given estimator and can afford the additional computational cost. Use embedded methods (SelectFromModel with L1-regularized models or tree-based feature importances) when feature selection should be integrated with model training. Use mutual information-based scoring when features have non-linear relationships with the target. Use variance threshold as a simple baseline to remove constant or near-constant features.

Theoretical Basis

Filter Methods score each feature independently using a statistical test:

Variance Threshold: Remove features with variance below a threshold: Var(Xj)=1ni=1n(xijx¯j)2<τ

SelectKBest: Select the k features with the highest scores according to a scoring function:

  • ANOVA F-value (for classification): F=Between-group varianceWithin-group variance
  • Chi-squared test: χ2=(OE)2E for non-negative features
  • Mutual information: I(X;Y)=x,yp(x,y)logp(x,y)p(x)p(y)

Mutual information captures arbitrary (non-linear) dependencies between features and the target, unlike correlation-based measures.

Wrapper Methods search for optimal feature subsets by evaluating model performance:

Recursive Feature Elimination (RFE):

  1. Train the model on all features.
  2. Rank features by importance (e.g., coefficient magnitude, feature importance).
  3. Remove the least important feature(s).
  4. Repeat until the desired number of features is reached.

Sequential Feature Selector (SFS):

  • Forward selection: Start with no features; iteratively add the feature that most improves cross-validated performance.
  • Backward elimination: Start with all features; iteratively remove the feature whose removal least degrades performance.

Embedded Methods perform selection during training:

SelectFromModel uses an estimator's learned feature importances or coefficients to select features above a threshold: selected={j:|β^j|>τ}

For L1-regularized models, many coefficients are exactly zero, providing natural feature selection. For tree-based models, feature importance is typically measured by the total reduction in impurity contributed by each feature.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment