Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn KMeans

From Leeroopedia


Knowledge Sources
Domains Clustering, Centroid-Based Clustering
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for performing K-Means clustering provided by scikit-learn.

Description

KMeans is one of the most widely used clustering algorithms. It partitions n samples into k clusters by iteratively assigning each sample to the nearest cluster center and recomputing cluster centers as the mean of assigned samples. The implementation supports both Lloyd's and Elkan's algorithms and provides smart initialization via k-means++ for faster convergence. It inherits from _BaseKMeans and also implements the TransformerMixin interface for transforming data to cluster-distance space.

Usage

Use KMeans when you need a fast, general-purpose clustering algorithm with a known number of clusters. It works best when clusters are roughly spherical and of similar size. It is commonly used as a baseline clustering method and scales well to large datasets. For very large datasets, consider MiniBatchKMeans instead.

Code Reference

Source Location

Signature

class KMeans(_BaseKMeans):
    def __init__(
        self,
        n_clusters=8,
        *,
        init="k-means++",
        n_init="auto",
        max_iter=300,
        tol=1e-4,
        verbose=0,
        random_state=None,
        copy_x=True,
        algorithm="lloyd",
    ):

Import

from sklearn.cluster import KMeans

I/O Contract

Inputs

Name Type Required Description
n_clusters int No Number of clusters to form and centroids to generate. Default is 8.
init str, callable, or array-like No Initialization method: "k-means++", "random", array of shape (n_clusters, n_features), or callable. Default is "k-means++".
n_init "auto" or int No Number of times k-means is run with different seeds; best result is kept. Default is "auto".
max_iter int No Maximum iterations per single run. Default is 300.
tol float No Relative tolerance for convergence based on Frobenius norm of center changes. Default is 1e-4.
verbose int No Verbosity mode. Default is 0.
random_state int or RandomState No Random state for centroid initialization. Default is None.
copy_x bool No Whether to copy input data before centering. Default is True.
algorithm str No K-Means algorithm to use: "lloyd" or "elkan". Default is "lloyd".

Outputs

Name Type Description
cluster_centers_ ndarray of shape (n_clusters, n_features) Coordinates of cluster centers.
labels_ ndarray of shape (n_samples,) Label of each sample (index of closest center).
inertia_ float Sum of squared distances of samples to their closest cluster center.
n_iter_ int Number of iterations run.
n_features_in_ int Number of features seen during fit.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

kmeans = KMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
print(kmeans.labels_)
print(kmeans.cluster_centers_)
print(kmeans.predict([[0, 0], [12, 3]]))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment