Principle:DistrictDataLabs Yellowbrick Recursive Feature Elimination

Knowledge Sources	Yellowbrick Docs Yellowbrick
Domains	Machine_Learning, Model_Selection, Hyperparameter_Tuning
Last Updated	2026-02-08 00:00 GMT

Overview

Recursive feature elimination (RFE) is a wrapper-based feature selection method that iteratively removes the least important features from a model, using cross-validation to identify the optimal subset size that maximizes predictive performance.

Description

Recursive feature elimination is a greedy feature selection algorithm that works by repeatedly fitting a model and removing the weakest feature(s) at each step. The process begins with all features and proceeds by (1) fitting the model, (2) ranking features by importance (via coef_ or feature_importances_ attributes), and (3) discarding the lowest-ranked feature(s). This cycle repeats until a specified number of features remains or until only one feature is left.

When combined with cross-validation (RFECV), the algorithm evaluates model performance at each feature subset size. For every candidate number of features, the model is trained and scored using k-fold cross-validation. This produces a curve of cross-validated scores as a function of the number of selected features. The optimal number of features is the point where the cross-validated score is maximized. The final model is then refit with that optimal feature subset.

The step parameter controls how aggressively features are removed at each iteration. A step of 1 removes one feature at a time (most thorough but slowest), while larger values or fractional values (interpreted as a percentage of remaining features) speed up the process at the cost of granularity. The shape of the resulting score-vs-features curve provides diagnostic information: a flat curve suggests the model is insensitive to many features (possible redundancy), while a curve that drops sharply after the optimum indicates that remaining features are important.

Usage

Recursive feature elimination should be used when:

You want to identify the smallest feature set that preserves or maximizes model performance.
You have a high-dimensional dataset and need to reduce dimensionality for interpretability or computational efficiency.
You want a model-aware feature selection approach that accounts for feature interactions (unlike univariate filter methods).
You need to determine how many features are truly necessary for your specific estimator.

Theoretical Basis

RFE is a backward elimination strategy. Starting with the full feature set $ℱ = {1, 2, \dots, p}$ , the algorithm iterates:

Train the model $\hat{f}$ on the current feature set $ℱ_{t}$ .
Compute feature importance scores $w_{j}$ for all $j \in ℱ_{t}$ .
Remove the $s$ features with smallest $| w_{j} |$ .
Set $ℱ_{t + 1} = ℱ_{t} ∖ {j : j removed}$ .

The step parameter $s$ controls the number of features removed per iteration. For $s = 1$ , there are $p - 1$ iterations total.

With cross-validation, at each iteration the model is evaluated using k-fold CV:

${CV}_{k} (| ℱ_{t} |) = \frac{1}{k} \sum_{i = 1}^{k} S ({\hat{f}}_{ℱ_{t}}^{- i}, D_{i})$

The optimal number of features is:

$n^{*} = \arg \max_{| ℱ_{t} |} {CV}_{k} (| ℱ_{t} |)$

The computational complexity of RFECV is approximately $O (\frac{p}{s} \cdot k \cdot T_{fit})$ , where $T_{fit}$ is the cost of fitting the base estimator once.

Related Pages

Implemented By

Implementation:DistrictDataLabs_Yellowbrick_RFECV_Visualizer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment