Implementation:DistrictDataLabs Yellowbrick MissingValuesDispersion
| Knowledge Sources | |
|---|---|
| Domains | Data_Quality, Visualization |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Concrete tool for visualizing the spatial distribution of missing values across features and samples, provided by the Yellowbrick contrib module.
Description
The MissingValuesDispersion renders a scatter-style plot showing the exact locations of missing (NaN) values in a dataset. Each feature is a row, and markers indicate which sample indices have missing values. This reveals patterns in missingness such as block-missing or random-missing structures.
Usage
Import this visualizer when you need to understand the spatial pattern of missing data, not just the counts. It complements MissingValuesBar by showing where values are missing rather than how many.
Code Reference
Source Location
- Repository: DistrictDataLabs_Yellowbrick
- File: yellowbrick/contrib/missing/dispersion.py
- Lines: 1-235
Signature
class MissingValuesDispersion(MissingDataVisualizer):
def __init__(self, alpha=0.5, marker="|", classes=None, **kwargs):
"""Missing values dispersion plot visualizer."""
def missing_dispersion(X, y=None, ax=None, classes=None, alpha=0.5, marker="|", **kwargs):
"""Quick method for one-off missing values dispersion visualization."""
Import
from yellowbrick.contrib.missing import MissingValuesDispersion
from yellowbrick.contrib.missing.dispersion import missing_dispersion
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like or DataFrame | Yes | Feature data with potential NaN values |
| y | array-like | No | Target labels for coloring |
| alpha | float | No | Marker transparency (default: 0.5) |
| marker | str | No | ") |
Outputs
| Name | Type | Description |
|---|---|---|
| ax | matplotlib.Axes | Axes with dispersion scatter plot |
Usage Examples
import numpy as np
import pandas as pd
from yellowbrick.contrib.missing import MissingValuesDispersion
df = pd.DataFrame({
"A": [1, np.nan, 3, np.nan, 5, 6, np.nan, 8],
"B": [np.nan, 2, 3, 4, 5, np.nan, 7, 8],
"C": [1, 2, np.nan, 4, np.nan, 6, 7, np.nan],
})
viz = MissingValuesDispersion()
viz.fit(df.values)
viz.show()