Implementation:DistrictDataLabs Yellowbrick RFECV Visualizer
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Model_Selection, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for performing recursive feature elimination with cross-validation and visualizing the optimal number of features, provided by the Yellowbrick library.
Description
The RFECV visualizer performs recursive feature elimination with cross-validation to determine the optimal number of features for a given estimator. It wraps scikit-learn's sklearn.feature_selection.RFE and sklearn.model_selection.cross_val_score internally (note: it does not wrap sklearn.feature_selection.RFECV because it needs access to the internals of both the CV and RFE processes for visualization).
The class extends ModelVisualizer from the Yellowbrick base module. When fit(X, y) is called, the visualizer creates feature subset sizes based on the total number of features and the step parameter. For each subset size, it configures an RFE instance, performs cross-validation, and collects the scores. The subset with the highest mean cross-validated score is selected as optimal. A final RFE model is fit with that optimal number of features and stored as rfe_estimator_, which is also set as the wrapped model so the visualizer can be used directly for predictions.
The visualization plots the mean cross-validated score against the number of features selected, with a shaded band for one standard deviation. A vertical dashed line marks the optimal number of features.
Usage
Use this visualizer when you need to determine the optimal number of features for a model that exposes coef_ or feature_importances_ after fitting. The fitted visualizer can also serve as a predictor since it wraps the final RFE estimator.
Code Reference
Source Location
- Repository: yellowbrick
- File: yellowbrick/model_selection/rfecv.py
- Class Lines: L35-261 (class), L140-142 (__init__), L153-219 (fit)
- Quick Method Lines: L268-365
Signature
class RFECV(ModelVisualizer):
def __init__(
self, estimator, ax=None, step=1, groups=None, cv=None, scoring=None, **kwargs
):
Import
from yellowbrick.model_selection import RFECV
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | scikit-learn estimator | Yes | A model with coef_ or feature_importances_ after fitting. Cloned for each validation.
|
| ax | matplotlib.Axes | No | The axes object to plot on. Default: None (current axes). |
| step | int or float | No | Number of features to remove per iteration (int >= 1) or fraction to remove (0.0 < float < 1.0). Default: 1. |
| groups | array-like, shape (n_samples,) | No | Group labels for train/test splitting. Default: None. |
| cv | int, CV generator, or iterable | No | Cross-validation splitting strategy. Default: None (3-fold). |
| scoring | string, callable, or None | No | Scoring metric. Default: None (estimator's default scorer). |
The fit(X, y) method accepts:
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like, shape (n_samples, n_features) | Yes | Training feature matrix. |
| y | array-like, shape (n_samples,) | No | Target values for classification or regression. |
Outputs
| Name | Type | Description |
|---|---|---|
| n_features_ | int | The number of features in the selected optimal subset. |
| support_ | array, shape (n_features,) | Boolean mask of selected features. |
| ranking_ | array, shape (n_features,) | Feature ranking where rank 1 indicates a selected feature. |
| cv_scores_ | array, shape (n_subsets, n_splits) | Cross-validation scores for each feature subset and CV split. |
| rfe_estimator_ | sklearn.feature_selection.RFE | The fitted RFE estimator wrapping the original model. |
| n_feature_subsets_ | array, shape (n_subsets,) | The number of features evaluated at each RFE iteration. |
Usage Examples
Basic Usage
from sklearn.ensemble import RandomForestClassifier
from yellowbrick.model_selection import RFECV
# Create and fit the visualizer
viz = RFECV(RandomForestClassifier(n_estimators=100), cv=5, scoring="f1_weighted")
viz.fit(X_train, y_train)
viz.show()
# The visualizer can also be used for predictions
y_pred = viz.predict(X_test)
Quick Method
from yellowbrick.model_selection import rfecv
from sklearn.ensemble import RandomForestClassifier
rfecv(RandomForestClassifier(n_estimators=100), X_train, y_train, cv=5)