Implementation:DistrictDataLabs Yellowbrick TSNEVisualizer
| Knowledge Sources | |
|---|---|
| Domains | NLP, Visualization, Dimensionality_Reduction |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Concrete tool for visualizing document similarity in 2D space using t-SNE dimensionality reduction, provided by the Yellowbrick text module.
Description
The TSNEVisualizer applies t-SNE (t-distributed Stochastic Neighbor Embedding) to high-dimensional document vectors to produce a 2D scatter plot where similar documents cluster together. It supports optional preliminary dimensionality reduction via SVD or PCA before applying t-SNE, and colors points by document class labels.
Usage
Import this visualizer when exploring document similarity in a text corpus. It works with pre-vectorized document-term matrices and is useful for identifying clusters of similar documents.
Code Reference
Source Location
- Repository: DistrictDataLabs_Yellowbrick
- File: yellowbrick/text/tsne.py
- Lines: 1-428
Signature
class TSNEVisualizer(TextVisualizer):
def __init__(
self,
ax=None,
decompose="svd",
decompose_by=50,
labels=None,
classes=None,
colors=None,
colormap=None,
random_state=None,
alpha=0.7,
**kwargs,
):
"""t-SNE document similarity visualizer."""
def make_transformer(self, decompose="svd", decompose_by=50, tsne_kwargs={}):
"""Creates the decomposition + t-SNE pipeline."""
def tsne(
X, y=None, ax=None, decompose="svd", decompose_by=50, labels=None,
colors=None, colormap=None, alpha=0.7, show=True, **kwargs,
):
"""Quick method for one-off t-SNE visualization."""
Import
from yellowbrick.text import TSNEVisualizer
from yellowbrick.text.tsne import tsne
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | sparse or dense matrix | Yes | Document-term matrix (fit) |
| y | array-like | No | Document labels for coloring |
| decompose | str | No | Pre-reduction: "svd" or "pca" (default: "svd") |
| decompose_by | int | No | Intermediate dimensions (default: 50) |
| alpha | float | No | Point transparency (default: 0.7) |
Outputs
| Name | Type | Description |
|---|---|---|
| ax | matplotlib.Axes | Axes with 2D t-SNE scatter plot |
Usage Examples
from sklearn.feature_extraction.text import TfidfVectorizer
from yellowbrick.text import TSNEVisualizer
from yellowbrick.datasets import load_hobbies
corpus = load_hobbies()
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(corpus.data)
viz = TSNEVisualizer()
viz.fit(X, corpus.target)
viz.show()