Implementation:DistrictDataLabs Yellowbrick DispersionPlot
| Knowledge Sources | |
|---|---|
| Domains | NLP, Visualization |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Concrete tool for visualizing the lexical dispersion of search terms across documents in a corpus, provided by the Yellowbrick text module.
Description
The DispersionPlot visualizes where specified search terms appear across the documents of a corpus. Each term is plotted on a separate row with markers at the word positions where it occurs, optionally annotated with document boundaries and colored by target class.
Usage
Import this visualizer when analyzing the distribution of specific terms across a text corpus to understand how evenly or unevenly words are distributed.
Code Reference
Source Location
- Repository: DistrictDataLabs_Yellowbrick
- File: yellowbrick/text/dispersion.py
- Lines: 1-421
Signature
class DispersionPlot(TextVisualizer):
def __init__(
self,
search_terms,
ax=None,
colors=None,
colormap=None,
ignore_case=False,
annotate_docs=False,
labels=None,
**kwargs,
):
"""Lexical dispersion plot for search terms across a corpus."""
def dispersion(
search_terms, corpus, y=None, ax=None, colors=None, colormap=None,
annotate_docs=False, ignore_case=False, labels=None, show=True, **kwargs,
):
"""Quick method for one-off dispersion visualization."""
Import
from yellowbrick.text import DispersionPlot
from yellowbrick.text.dispersion import dispersion
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| search_terms | list of str | Yes | Words to track across corpus |
| X | list of str | Yes | Corpus of documents (fit) |
| y | array-like | No | Target labels for coloring |
| ignore_case | bool | No | Case-insensitive matching (default: False) |
| annotate_docs | bool | No | Show document boundaries (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| ax | matplotlib.Axes | Axes with dispersion plot |
Usage Examples
from yellowbrick.text import DispersionPlot
from yellowbrick.datasets import load_hobbies
corpus = load_hobbies()
viz = DispersionPlot(["film", "game", "sport", "book"])
viz.fit(corpus.data)
viz.show()