Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick RFECV Visualizer

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Model_Selection, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for performing recursive feature elimination with cross-validation and visualizing the optimal number of features, provided by the Yellowbrick library.

Description

The RFECV visualizer performs recursive feature elimination with cross-validation to determine the optimal number of features for a given estimator. It wraps scikit-learn's sklearn.feature_selection.RFE and sklearn.model_selection.cross_val_score internally (note: it does not wrap sklearn.feature_selection.RFECV because it needs access to the internals of both the CV and RFE processes for visualization).

The class extends ModelVisualizer from the Yellowbrick base module. When fit(X, y) is called, the visualizer creates feature subset sizes based on the total number of features and the step parameter. For each subset size, it configures an RFE instance, performs cross-validation, and collects the scores. The subset with the highest mean cross-validated score is selected as optimal. A final RFE model is fit with that optimal number of features and stored as rfe_estimator_, which is also set as the wrapped model so the visualizer can be used directly for predictions.

The visualization plots the mean cross-validated score against the number of features selected, with a shaded band for one standard deviation. A vertical dashed line marks the optimal number of features.

Usage

Use this visualizer when you need to determine the optimal number of features for a model that exposes coef_ or feature_importances_ after fitting. The fitted visualizer can also serve as a predictor since it wraps the final RFE estimator.

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/model_selection/rfecv.py
  • Class Lines: L35-261 (class), L140-142 (__init__), L153-219 (fit)
  • Quick Method Lines: L268-365

Signature

class RFECV(ModelVisualizer):
    def __init__(
        self, estimator, ax=None, step=1, groups=None, cv=None, scoring=None, **kwargs
    ):

Import

from yellowbrick.model_selection import RFECV

I/O Contract

Inputs

Name Type Required Description
estimator scikit-learn estimator Yes A model with coef_ or feature_importances_ after fitting. Cloned for each validation.
ax matplotlib.Axes No The axes object to plot on. Default: None (current axes).
step int or float No Number of features to remove per iteration (int >= 1) or fraction to remove (0.0 < float < 1.0). Default: 1.
groups array-like, shape (n_samples,) No Group labels for train/test splitting. Default: None.
cv int, CV generator, or iterable No Cross-validation splitting strategy. Default: None (3-fold).
scoring string, callable, or None No Scoring metric. Default: None (estimator's default scorer).

The fit(X, y) method accepts:

Name Type Required Description
X array-like, shape (n_samples, n_features) Yes Training feature matrix.
y array-like, shape (n_samples,) No Target values for classification or regression.

Outputs

Name Type Description
n_features_ int The number of features in the selected optimal subset.
support_ array, shape (n_features,) Boolean mask of selected features.
ranking_ array, shape (n_features,) Feature ranking where rank 1 indicates a selected feature.
cv_scores_ array, shape (n_subsets, n_splits) Cross-validation scores for each feature subset and CV split.
rfe_estimator_ sklearn.feature_selection.RFE The fitted RFE estimator wrapping the original model.
n_feature_subsets_ array, shape (n_subsets,) The number of features evaluated at each RFE iteration.

Usage Examples

Basic Usage

from sklearn.ensemble import RandomForestClassifier
from yellowbrick.model_selection import RFECV

# Create and fit the visualizer
viz = RFECV(RandomForestClassifier(n_estimators=100), cv=5, scoring="f1_weighted")
viz.fit(X_train, y_train)
viz.show()

# The visualizer can also be used for predictions
y_pred = viz.predict(X_test)

Quick Method

from yellowbrick.model_selection import rfecv
from sklearn.ensemble import RandomForestClassifier

rfecv(RandomForestClassifier(n_estimators=100), X_train, y_train, cv=5)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment