Principle:DistrictDataLabs Yellowbrick Recursive Feature Elimination
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Model_Selection, Hyperparameter_Tuning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Recursive feature elimination (RFE) is a wrapper-based feature selection method that iteratively removes the least important features from a model, using cross-validation to identify the optimal subset size that maximizes predictive performance.
Description
Recursive feature elimination is a greedy feature selection algorithm that works by repeatedly fitting a model and removing the weakest feature(s) at each step. The process begins with all features and proceeds by (1) fitting the model, (2) ranking features by importance (via coef_ or feature_importances_ attributes), and (3) discarding the lowest-ranked feature(s). This cycle repeats until a specified number of features remains or until only one feature is left.
When combined with cross-validation (RFECV), the algorithm evaluates model performance at each feature subset size. For every candidate number of features, the model is trained and scored using k-fold cross-validation. This produces a curve of cross-validated scores as a function of the number of selected features. The optimal number of features is the point where the cross-validated score is maximized. The final model is then refit with that optimal feature subset.
The step parameter controls how aggressively features are removed at each iteration. A step of 1 removes one feature at a time (most thorough but slowest), while larger values or fractional values (interpreted as a percentage of remaining features) speed up the process at the cost of granularity. The shape of the resulting score-vs-features curve provides diagnostic information: a flat curve suggests the model is insensitive to many features (possible redundancy), while a curve that drops sharply after the optimum indicates that remaining features are important.
Usage
Recursive feature elimination should be used when:
- You want to identify the smallest feature set that preserves or maximizes model performance.
- You have a high-dimensional dataset and need to reduce dimensionality for interpretability or computational efficiency.
- You want a model-aware feature selection approach that accounts for feature interactions (unlike univariate filter methods).
- You need to determine how many features are truly necessary for your specific estimator.
Theoretical Basis
RFE is a backward elimination strategy. Starting with the full feature set , the algorithm iterates:
- Train the model on the current feature set .
- Compute feature importance scores for all .
- Remove the features with smallest .
- Set .
The step parameter controls the number of features removed per iteration. For , there are iterations total.
With cross-validation, at each iteration the model is evaluated using k-fold CV:
The optimal number of features is:
The computational complexity of RFECV is approximately , where is the cost of fitting the base estimator once.