Implementation:DistrictDataLabs Yellowbrick DroppingCurve Visualizer

Knowledge Sources	Yellowbrick Yellowbrick Docs
Domains	Machine_Learning, Model_Selection, Visualization
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for visualizing a random-feature dropping curve to assess model sensitivity to the number of input features, provided by the Yellowbrick library.

Description

The DroppingCurve visualizer selects random subsets of features at various sizes and evaluates the training and cross-validation performance of the wrapped model at each size. The result is a curve that shows how model performance scales with the number of available features. This is also referred to as a random-input-dropout curve or neuron dropping curve (NDC) in neural decoding research.

The class extends ModelVisualizer from the Yellowbrick base module. Internally, the visualizer constructs a pipeline that prepends a SelectKBest feature selector (with a random scoring function) to the user-provided estimator. It then leverages scikit-learn's sklearn.model_selection.validation_curve to sweep the selectkbest__k parameter across the specified feature sizes. This clever reuse of the validation curve machinery avoids reimplementing cross-validation logic.

When fit(X, y) is called, the visualizer converts fractional feature sizes (e.g. 0.1 to 1.0) into absolute counts based on the total number of features in X. For each feature count, cross-validated training and validation scores are computed. The mean scores and standard deviations are stored as attributes, and draw() renders training and cross-validation score curves with shaded variance bands. The x-axis can optionally use a logarithmic scale via the logx parameter.

Usage

Use this visualizer when you want to understand how many features your model truly needs, or when you want to assess the robustness of your model to random feature dropout. It works with any scikit-learn estimator that implements fit and predict.

Code Reference

Source Location

Repository: yellowbrick
File: yellowbrick/model_selection/dropping_curve.py
Class Lines: L36-299 (class), L149-162 (__init__), L186-253 (fit)
Quick Method Lines: L307-426

Signature

class DroppingCurve(ModelVisualizer):
    def __init__(
        self,
        estimator,
        ax=None,
        feature_sizes=DEFAULT_FEATURE_SIZES,
        groups=None,
        logx=False,
        cv=None,
        scoring=None,
        n_jobs=None,
        pre_dispatch='all',
        random_state=None,
        **kwargs
    ):

Import

from yellowbrick.model_selection import DroppingCurve

I/O Contract

Inputs

Name	Type	Required	Description
estimator	scikit-learn estimator	Yes	An object implementing `fit` and `predict`. Cloned for each validation.
ax	matplotlib.Axes	No	The axes object to plot on. Default: None (current axes).
feature_sizes	array-like, shape (n_values,)	No	Relative (float) or absolute (int) numbers of features to evaluate. Default: `np.linspace(0.1, 1.0, 5)`.
groups	array-like, shape (n_samples,)	No	Group labels for train/test splitting. Default: None.
logx	boolean	No	If True, uses logarithmic scale for x-axis. Default: False.
cv	int, CV generator, or iterable	No	Cross-validation splitting strategy. Default: None (3-fold).
scoring	string, callable, or None	No	Scoring metric. Default: None (estimator's default scorer).
n_jobs	integer	No	Number of parallel jobs. Default: None.
pre_dispatch	integer or string	No	Number of predispatched jobs. Default: "all".
random_state	int, RandomState, or None	No	Seed for random feature selection. Default: None.

The fit(X, y) method accepts:

Name	Type	Required	Description
X	array-like, shape (n_samples, n_features)	Yes	Input feature matrix.
y	array-like, shape (n_samples,)	No	Target values. None for unsupervised learning.

Outputs

Name	Type	Description
feature_sizes_	array, shape (n_unique_ticks,), dtype int	Absolute numbers of features used at each evaluation point.
train_scores_	array, shape (n_ticks, n_cv_folds)	Raw scores on training sets for each feature size and fold.
train_scores_mean_	array, shape (n_ticks,)	Mean training score for each feature size.
train_scores_std_	array, shape (n_ticks,)	Standard deviation of training scores for each feature size.
valid_scores_	array, shape (n_ticks, n_cv_folds)	Raw scores on validation sets for each feature size and fold.
valid_scores_mean_	array, shape (n_ticks,)	Mean cross-validated score for each feature size.
valid_scores_std_	array, shape (n_ticks,)	Standard deviation of cross-validated scores for each feature size.

Usage Examples

Basic Usage

from sklearn.naive_bayes import GaussianNB
from yellowbrick.model_selection import DroppingCurve

# Create and fit the visualizer
viz = DroppingCurve(GaussianNB(), cv=5, scoring="accuracy", random_state=42)
viz.fit(X_train, y_train)
viz.show()

Quick Method

from yellowbrick.model_selection import dropping_curve
from sklearn.naive_bayes import GaussianNB

dropping_curve(GaussianNB(), X_train, y_train, cv=5, scoring="accuracy", random_state=42)

Related Pages

Implements Principle

Principle:DistrictDataLabs_Yellowbrick_Feature_Dropping_Analysis

Requires Environment

Environment:DistrictDataLabs_Yellowbrick_Python_Scikit_Learn_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment