Implementation:Scikit learn Scikit learn NMF
| Knowledge Sources | |
|---|---|
| Domains | Dimensionality Reduction, Matrix Factorization |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for Non-Negative Matrix Factorization provided by scikit-learn.
Description
NMF (Non-Negative Matrix Factorization) finds two non-negative matrices W and H whose product approximates the non-negative input matrix X. The factorization is useful for dimensionality reduction, source separation, and topic extraction. The objective function combines a data fit term using beta-divergence (Frobenius norm by default) with L1 and L2 regularization on both the W and H matrices. The module also provides MiniBatchNMF for scalable online learning.
Usage
Use NMF when your data is non-negative and you want interpretable, parts-based representations. Common applications include topic modeling on term-frequency matrices, image feature extraction, audio source separation, and recommender systems where the non-negativity constraint leads to naturally additive and interpretable components.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/decomposition/_nmf.py
Signature
class NMF(_BaseNMF):
def __init__(
self,
n_components="auto",
*,
init=None,
solver="cd",
beta_loss="frobenius",
tol=1e-4,
max_iter=200,
random_state=None,
alpha_W=0.0,
alpha_H="same",
l1_ratio=0.0,
verbose=0,
shuffle=False,
):
Import
from sklearn.decomposition import NMF
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int or 'auto' | No | Number of components. If 'auto', inferred from W or H shapes (default='auto'). |
| init | str | No | Initialization method: 'random', 'nndsvd', 'nndsvda', 'nndsvdar', or 'custom'. |
| solver | str | No | Numerical solver: 'cd' (coordinate descent) or 'mu' (multiplicative update) (default='cd'). |
| beta_loss | str or float | No | Beta divergence loss: 'frobenius', 'kullback-leibler', 'itakura-saito', or float (default='frobenius'). |
| tol | float | No | Stopping tolerance (default=1e-4). |
| max_iter | int | No | Maximum number of iterations (default=200). |
| random_state | int or RandomState | No | Random state for reproducibility. |
| alpha_W | float | No | Regularization constant for W (default=0.0). |
| alpha_H | float or str | No | Regularization constant for H. 'same' uses alpha_W value (default='same'). |
| l1_ratio | float | No | Ratio of L1 vs L2 regularization, in [0, 1] (default=0.0). |
| shuffle | bool | No | Whether to shuffle data in coordinate descent solver (default=False). |
Outputs
| Name | Type | Description |
|---|---|---|
| components_ | ndarray of shape (n_components, n_features) | Factorization matrix H (the dictionary/components). |
| n_components_ | int | The number of components, validated against input. |
| reconstruction_err_ | float | Frobenius norm of the difference between X and WH. |
| n_iter_ | int | Actual number of iterations. |
| n_features_in_ | int | Number of features seen during fit. |
Usage Examples
Basic Usage
import numpy as np
from sklearn.decomposition import NMF
X = np.array([[1, 1, 2], [2, 1, 0], [0, 2, 1], [1, 0, 2]], dtype=float)
nmf = NMF(n_components=2, init='random', random_state=0)
W = nmf.fit_transform(X)
H = nmf.components_
print(W.shape) # (4, 2)
print(H.shape) # (2, 3)
print(nmf.reconstruction_err_)