Implementation:Scikit learn Scikit learn GaussianMixture
| Knowledge Sources | |
|---|---|
| Domains | Mixture Models, Density Estimation |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for estimating the parameters of a Gaussian mixture model probability distribution provided by scikit-learn.
Description
GaussianMixture represents a Gaussian mixture model probability distribution and estimates its parameters using the Expectation-Maximization (EM) algorithm. It supports four covariance structure types: full, tied, diagonal, and spherical. The class extends BaseMixture and provides methods for fitting the model, predicting cluster membership, computing per-sample log-likelihoods, scoring, and sampling from the fitted distribution. It can also be initialized with user-provided weights, means, and precision matrices.
Usage
Use GaussianMixture when you want a soft (probabilistic) clustering method, when you need density estimation for generative modeling, or when the data is well-described by a mixture of Gaussian distributions. It is commonly used for speaker recognition, image segmentation, anomaly detection via likelihood thresholding, and as a clustering method that provides uncertainty estimates.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/mixture/_gaussian_mixture.py
Signature
class GaussianMixture(BaseMixture):
def __init__(
self,
n_components=1,
*,
covariance_type="full",
tol=1e-3,
reg_covar=1e-6,
max_iter=100,
n_init=1,
init_params="kmeans",
weights_init=None,
means_init=None,
precisions_init=None,
random_state=None,
warm_start=False,
verbose=0,
verbose_interval=10,
):
Import
from sklearn.mixture import GaussianMixture
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int | No | Number of mixture components. Default is 1. |
| covariance_type | str | No | Covariance parameter type: "full", "tied", "diag", or "spherical". Default is "full". |
| tol | float | No | Convergence threshold for EM iterations. Default is 1e-3. |
| reg_covar | float | No | Regularization added to diagonal of covariance matrices. Default is 1e-6. |
| max_iter | int | No | Maximum number of EM iterations. Default is 100. |
| n_init | int | No | Number of initializations; best result is kept. Default is 1. |
| init_params | str | No | Initialization method: "kmeans", "random", "random_from_data", or "k-means++". Default is "kmeans". |
| weights_init | array-like of shape (n_components,) or None | No | User-provided initial weights. Default is None. |
| means_init | array-like of shape (n_components, n_features) or None | No | User-provided initial means. Default is None. |
| precisions_init | array-like or None | No | User-provided initial precisions (inverse of covariance). Default is None. |
| random_state | int or RandomState | No | Random state for reproducibility. Default is None. |
| warm_start | bool | No | Whether to use solution of last fit as initialization. Default is False. |
| verbose | int | No | Verbosity level. Default is 0. |
| verbose_interval | int | No | Interval between verbose log messages. Default is 10. |
Outputs
| Name | Type | Description |
|---|---|---|
| weights_ | ndarray of shape (n_components,) | Weight of each mixture component. |
| means_ | ndarray of shape (n_components, n_features) | Mean of each mixture component. |
| covariances_ | ndarray | Covariance of each mixture component (shape depends on covariance_type). |
| precisions_ | ndarray | Precision matrices for each component. |
| precisions_cholesky_ | ndarray | Cholesky decomposition of the precision matrices. |
| converged_ | bool | Whether EM converged. |
| n_iter_ | int | Number of EM iterations performed for the best initialization. |
| lower_bound_ | float | Lower bound value on the log-likelihood of the best fit. |
| bic | float | Bayesian Information Criterion (available via bic() method). |
| aic | float | Akaike Information Criterion (available via aic() method). |
Usage Examples
Basic Usage
from sklearn.mixture import GaussianMixture
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0]])
gm = GaussianMixture(n_components=2, random_state=0).fit(X)
print(gm.predict(X))
print(gm.means_)
print(gm.predict_proba(X))