Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick DroppingCurve Visualizer

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Model_Selection, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for visualizing a random-feature dropping curve to assess model sensitivity to the number of input features, provided by the Yellowbrick library.

Description

The DroppingCurve visualizer selects random subsets of features at various sizes and evaluates the training and cross-validation performance of the wrapped model at each size. The result is a curve that shows how model performance scales with the number of available features. This is also referred to as a random-input-dropout curve or neuron dropping curve (NDC) in neural decoding research.

The class extends ModelVisualizer from the Yellowbrick base module. Internally, the visualizer constructs a pipeline that prepends a SelectKBest feature selector (with a random scoring function) to the user-provided estimator. It then leverages scikit-learn's sklearn.model_selection.validation_curve to sweep the selectkbest__k parameter across the specified feature sizes. This clever reuse of the validation curve machinery avoids reimplementing cross-validation logic.

When fit(X, y) is called, the visualizer converts fractional feature sizes (e.g. 0.1 to 1.0) into absolute counts based on the total number of features in X. For each feature count, cross-validated training and validation scores are computed. The mean scores and standard deviations are stored as attributes, and draw() renders training and cross-validation score curves with shaded variance bands. The x-axis can optionally use a logarithmic scale via the logx parameter.

Usage

Use this visualizer when you want to understand how many features your model truly needs, or when you want to assess the robustness of your model to random feature dropout. It works with any scikit-learn estimator that implements fit and predict.

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/model_selection/dropping_curve.py
  • Class Lines: L36-299 (class), L149-162 (__init__), L186-253 (fit)
  • Quick Method Lines: L307-426

Signature

class DroppingCurve(ModelVisualizer):
    def __init__(
        self,
        estimator,
        ax=None,
        feature_sizes=DEFAULT_FEATURE_SIZES,
        groups=None,
        logx=False,
        cv=None,
        scoring=None,
        n_jobs=None,
        pre_dispatch='all',
        random_state=None,
        **kwargs
    ):

Import

from yellowbrick.model_selection import DroppingCurve

I/O Contract

Inputs

Name Type Required Description
estimator scikit-learn estimator Yes An object implementing fit and predict. Cloned for each validation.
ax matplotlib.Axes No The axes object to plot on. Default: None (current axes).
feature_sizes array-like, shape (n_values,) No Relative (float) or absolute (int) numbers of features to evaluate. Default: np.linspace(0.1, 1.0, 5).
groups array-like, shape (n_samples,) No Group labels for train/test splitting. Default: None.
logx boolean No If True, uses logarithmic scale for x-axis. Default: False.
cv int, CV generator, or iterable No Cross-validation splitting strategy. Default: None (3-fold).
scoring string, callable, or None No Scoring metric. Default: None (estimator's default scorer).
n_jobs integer No Number of parallel jobs. Default: None.
pre_dispatch integer or string No Number of predispatched jobs. Default: "all".
random_state int, RandomState, or None No Seed for random feature selection. Default: None.

The fit(X, y) method accepts:

Name Type Required Description
X array-like, shape (n_samples, n_features) Yes Input feature matrix.
y array-like, shape (n_samples,) No Target values. None for unsupervised learning.

Outputs

Name Type Description
feature_sizes_ array, shape (n_unique_ticks,), dtype int Absolute numbers of features used at each evaluation point.
train_scores_ array, shape (n_ticks, n_cv_folds) Raw scores on training sets for each feature size and fold.
train_scores_mean_ array, shape (n_ticks,) Mean training score for each feature size.
train_scores_std_ array, shape (n_ticks,) Standard deviation of training scores for each feature size.
valid_scores_ array, shape (n_ticks, n_cv_folds) Raw scores on validation sets for each feature size and fold.
valid_scores_mean_ array, shape (n_ticks,) Mean cross-validated score for each feature size.
valid_scores_std_ array, shape (n_ticks,) Standard deviation of cross-validated scores for each feature size.

Usage Examples

Basic Usage

from sklearn.naive_bayes import GaussianNB
from yellowbrick.model_selection import DroppingCurve

# Create and fit the visualizer
viz = DroppingCurve(GaussianNB(), cv=5, scoring="accuracy", random_state=42)
viz.fit(X_train, y_train)
viz.show()

Quick Method

from yellowbrick.model_selection import dropping_curve
from sklearn.naive_bayes import GaussianNB

dropping_curve(GaussianNB(), X_train, y_train, cv=5, scoring="accuracy", random_state=42)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment