Implementation:Scikit learn Scikit learn KMeans

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Clustering, Centroid-Based Clustering
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for performing K-Means clustering provided by scikit-learn.

Description

KMeans is one of the most widely used clustering algorithms. It partitions n samples into k clusters by iteratively assigning each sample to the nearest cluster center and recomputing cluster centers as the mean of assigned samples. The implementation supports both Lloyd's and Elkan's algorithms and provides smart initialization via k-means++ for faster convergence. It inherits from _BaseKMeans and also implements the TransformerMixin interface for transforming data to cluster-distance space.

Usage

Use KMeans when you need a fast, general-purpose clustering algorithm with a known number of clusters. It works best when clusters are roughly spherical and of similar size. It is commonly used as a baseline clustering method and scales well to large datasets. For very large datasets, consider MiniBatchKMeans instead.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/cluster/_kmeans.py

Signature

class KMeans(_BaseKMeans):
    def __init__(
        self,
        n_clusters=8,
        *,
        init="k-means++",
        n_init="auto",
        max_iter=300,
        tol=1e-4,
        verbose=0,
        random_state=None,
        copy_x=True,
        algorithm="lloyd",
    ):

Import

from sklearn.cluster import KMeans

I/O Contract

Inputs

Name	Type	Required	Description
n_clusters	int	No	Number of clusters to form and centroids to generate. Default is 8.
init	str, callable, or array-like	No	Initialization method: "k-means++", "random", array of shape (n_clusters, n_features), or callable. Default is "k-means++".
n_init	"auto" or int	No	Number of times k-means is run with different seeds; best result is kept. Default is "auto".
max_iter	int	No	Maximum iterations per single run. Default is 300.
tol	float	No	Relative tolerance for convergence based on Frobenius norm of center changes. Default is 1e-4.
verbose	int	No	Verbosity mode. Default is 0.
random_state	int or RandomState	No	Random state for centroid initialization. Default is None.
copy_x	bool	No	Whether to copy input data before centering. Default is True.
algorithm	str	No	K-Means algorithm to use: "lloyd" or "elkan". Default is "lloyd".

Outputs

Name	Type	Description
cluster_centers_	ndarray of shape (n_clusters, n_features)	Coordinates of cluster centers.
labels_	ndarray of shape (n_samples,)	Label of each sample (index of closest center).
inertia_	float	Sum of squared distances of samples to their closest cluster center.
n_iter_	int	Number of iterations run.
n_features_in_	int	Number of features seen during fit.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

kmeans = KMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
print(kmeans.labels_)
print(kmeans.cluster_centers_)
print(kmeans.predict([[0, 0], [12, 3]]))

Related Pages

Principle:Scikit_learn_Scikit_learn_Clustering

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment