Implementation:DistrictDataLabs Yellowbrick DroppingCurve Visualizer
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Model_Selection, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for visualizing a random-feature dropping curve to assess model sensitivity to the number of input features, provided by the Yellowbrick library.
Description
The DroppingCurve visualizer selects random subsets of features at various sizes and evaluates the training and cross-validation performance of the wrapped model at each size. The result is a curve that shows how model performance scales with the number of available features. This is also referred to as a random-input-dropout curve or neuron dropping curve (NDC) in neural decoding research.
The class extends ModelVisualizer from the Yellowbrick base module. Internally, the visualizer constructs a pipeline that prepends a SelectKBest feature selector (with a random scoring function) to the user-provided estimator. It then leverages scikit-learn's sklearn.model_selection.validation_curve to sweep the selectkbest__k parameter across the specified feature sizes. This clever reuse of the validation curve machinery avoids reimplementing cross-validation logic.
When fit(X, y) is called, the visualizer converts fractional feature sizes (e.g. 0.1 to 1.0) into absolute counts based on the total number of features in X. For each feature count, cross-validated training and validation scores are computed. The mean scores and standard deviations are stored as attributes, and draw() renders training and cross-validation score curves with shaded variance bands. The x-axis can optionally use a logarithmic scale via the logx parameter.
Usage
Use this visualizer when you want to understand how many features your model truly needs, or when you want to assess the robustness of your model to random feature dropout. It works with any scikit-learn estimator that implements fit and predict.
Code Reference
Source Location
- Repository: yellowbrick
- File: yellowbrick/model_selection/dropping_curve.py
- Class Lines: L36-299 (class), L149-162 (__init__), L186-253 (fit)
- Quick Method Lines: L307-426
Signature
class DroppingCurve(ModelVisualizer):
def __init__(
self,
estimator,
ax=None,
feature_sizes=DEFAULT_FEATURE_SIZES,
groups=None,
logx=False,
cv=None,
scoring=None,
n_jobs=None,
pre_dispatch='all',
random_state=None,
**kwargs
):
Import
from yellowbrick.model_selection import DroppingCurve
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | scikit-learn estimator | Yes | An object implementing fit and predict. Cloned for each validation.
|
| ax | matplotlib.Axes | No | The axes object to plot on. Default: None (current axes). |
| feature_sizes | array-like, shape (n_values,) | No | Relative (float) or absolute (int) numbers of features to evaluate. Default: np.linspace(0.1, 1.0, 5).
|
| groups | array-like, shape (n_samples,) | No | Group labels for train/test splitting. Default: None. |
| logx | boolean | No | If True, uses logarithmic scale for x-axis. Default: False. |
| cv | int, CV generator, or iterable | No | Cross-validation splitting strategy. Default: None (3-fold). |
| scoring | string, callable, or None | No | Scoring metric. Default: None (estimator's default scorer). |
| n_jobs | integer | No | Number of parallel jobs. Default: None. |
| pre_dispatch | integer or string | No | Number of predispatched jobs. Default: "all". |
| random_state | int, RandomState, or None | No | Seed for random feature selection. Default: None. |
The fit(X, y) method accepts:
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like, shape (n_samples, n_features) | Yes | Input feature matrix. |
| y | array-like, shape (n_samples,) | No | Target values. None for unsupervised learning. |
Outputs
| Name | Type | Description |
|---|---|---|
| feature_sizes_ | array, shape (n_unique_ticks,), dtype int | Absolute numbers of features used at each evaluation point. |
| train_scores_ | array, shape (n_ticks, n_cv_folds) | Raw scores on training sets for each feature size and fold. |
| train_scores_mean_ | array, shape (n_ticks,) | Mean training score for each feature size. |
| train_scores_std_ | array, shape (n_ticks,) | Standard deviation of training scores for each feature size. |
| valid_scores_ | array, shape (n_ticks, n_cv_folds) | Raw scores on validation sets for each feature size and fold. |
| valid_scores_mean_ | array, shape (n_ticks,) | Mean cross-validated score for each feature size. |
| valid_scores_std_ | array, shape (n_ticks,) | Standard deviation of cross-validated scores for each feature size. |
Usage Examples
Basic Usage
from sklearn.naive_bayes import GaussianNB
from yellowbrick.model_selection import DroppingCurve
# Create and fit the visualizer
viz = DroppingCurve(GaussianNB(), cv=5, scoring="accuracy", random_state=42)
viz.fit(X_train, y_train)
viz.show()
Quick Method
from yellowbrick.model_selection import dropping_curve
from sklearn.naive_bayes import GaussianNB
dropping_curve(GaussianNB(), X_train, y_train, cv=5, scoring="accuracy", random_state=42)