Implementation:Scikit learn Scikit learn BenchPlotNMF
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Benchmarking |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for benchmarking Non-Negative Matrix Factorization (NMF) solvers provided by scikit-learn.
Description
This benchmark script compares multiple NMF solver implementations including coordinate descent, multiplicative update, and a projected gradient variant (_PGNMF). It measures reconstruction error (beta divergence) and runtime across various NMF configurations. The script includes a custom projected gradient NMF implementation for historical comparison purposes and uses TF-IDF vectorized text data for evaluation.
Usage
Use this benchmark to compare the convergence speed and accuracy of different NMF solver strategies available in scikit-learn, especially when choosing between coordinate descent and multiplicative update solvers.
Code Reference
Source Location
- Repository: scikit-learn
- File: benchmarks/bench_plot_nmf.py
Signature
# Key internal functions
def _nls_subproblem(X, W, H, tol, max_iter, alpha=0.0, l1_ratio=0.0, sigma=0.01, beta=0.1)
def _norm(x)
# Main NMF class used
from sklearn.decomposition import NMF
Import
from sklearn.decomposition import NMF
from sklearn.decomposition._nmf import _beta_divergence, _check_init, _initialize_nmf
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like | Yes | Input non-negative data matrix (typically TF-IDF features) |
| W | array-like | Yes | Basis matrix (n_samples, n_components) |
| H | array-like | Yes | Coefficient matrix (n_components, n_features) |
| tol | float | Yes | Tolerance for stopping condition |
| max_iter | int | Yes | Maximum number of iterations |
Outputs
| Name | Type | Description |
|---|---|---|
| Console output | text | Beta divergence and timing for each solver |
| Plot | matplotlib figure | Convergence plots comparing NMF solvers |
Usage Examples
Basic Usage
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset='train')
vectorizer = TfidfVectorizer(max_features=1000)
X = vectorizer.fit_transform(newsgroups.data)
nmf = NMF(n_components=10, solver='cd', max_iter=200, random_state=42)
W = nmf.fit_transform(X)
H = nmf.components_