Implementation:DistrictDataLabs Yellowbrick WordCorrelationPlot
| Knowledge Sources | |
|---|---|
| Domains | NLP, Visualization |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Concrete tool for visualizing word co-occurrence correlation as a heatmap across a text corpus, provided by the Yellowbrick text module.
Description
The WordCorrelationPlot computes pairwise Pearson correlation coefficients between specified words based on their co-occurrence patterns in documents. It renders a heatmap with color-coded correlation values and optional colorbar. The underlying computation uses scikit-learn's CountVectorizer to build term-document matrices.
Usage
Import this visualizer when analyzing relationships between specific words in a text corpus. It is useful for understanding which terms tend to appear together in documents.
Code Reference
Source Location
- Repository: DistrictDataLabs_Yellowbrick
- File: yellowbrick/text/correlation.py
- Lines: 1-349
Signature
class WordCorrelationPlot(TextVisualizer):
def __init__(
self,
words,
ignore_case=False,
ax=None,
cmap="RdYlBu",
colorbar=True,
fontsize=None,
**kwargs,
):
"""Visualizes word correlation as a heatmap."""
def word_correlation(
words, corpus, ignore_case=True, ax=None, cmap="RdYlBu",
show=True, colorbar=True, fontsize=None, **kwargs,
):
"""Quick method for one-off word correlation visualization."""
Import
from yellowbrick.text import WordCorrelationPlot
from yellowbrick.text.correlation import word_correlation
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| words | list of str | Yes | Words to compute correlations for |
| X | list of str | Yes | Corpus of documents (fit) |
| ignore_case | bool | No | Case-insensitive matching (default: False) |
| cmap | str | No | Colormap for heatmap (default: "RdYlBu") |
Outputs
| Name | Type | Description |
|---|---|---|
| ax | matplotlib.Axes | Axes with correlation heatmap |
Usage Examples
from yellowbrick.text import WordCorrelationPlot
from yellowbrick.datasets import load_hobbies
corpus = load_hobbies()
words = ["game", "sport", "music", "movie"]
viz = WordCorrelationPlot(words)
viz.fit(corpus.data)
viz.show()