Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn BisectingKMeans

From Leeroopedia


Knowledge Sources
Domains Clustering, Divisive Clustering
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for performing bisecting K-Means clustering provided by scikit-learn.

Description

BisectingKMeans is a variant of KMeans that uses a divisive (top-down) hierarchical approach. It starts with all data points in a single cluster and iteratively bisects the cluster with the largest score (based on either inertia or number of data points) using standard 2-means until the desired number of clusters is reached. Internally it builds a binary tree structure (_BisectingTree) to track the hierarchical cluster decomposition. It inherits from _BaseKMeans and supports both Lloyd and Elkan algorithms.

Usage

Use BisectingKMeans when you want a hierarchical variant of K-Means that produces more regular clusters with a consistent structure. It is well-suited when you need a divisive clustering approach, or when standard K-Means initialization strategies yield poor results due to data geometry.

Code Reference

Source Location

Signature

class BisectingKMeans(_BaseKMeans):
    def __init__(
        self,
        n_clusters=8,
        *,
        init="random",
        n_init=1,
        random_state=None,
        max_iter=300,
        verbose=0,
        tol=1e-4,
        copy_x=True,
        algorithm="lloyd",
        bisecting_strategy="biggest_inertia",
    ):

Import

from sklearn.cluster import BisectingKMeans

I/O Contract

Inputs

Name Type Required Description
n_clusters int No Number of clusters to form. Default is 8.
init str or callable No Initialization method: "k-means++", "random", or a callable. Default is "random".
n_init int No Number of inner k-means runs per bisection. Default is 1.
random_state int or RandomState No Random state for reproducibility. Default is None.
max_iter int No Maximum iterations per inner k-means run. Default is 300.
verbose int No Verbosity level. Default is 0.
tol float No Relative tolerance for convergence. Default is 1e-4.
copy_x bool No Whether to copy input data. Default is True.
algorithm str No K-Means algorithm to use: "lloyd" or "elkan". Default is "lloyd".
bisecting_strategy str No Strategy for choosing cluster to bisect: "biggest_inertia" or "largest_cluster". Default is "biggest_inertia".

Outputs

Name Type Description
cluster_centers_ ndarray of shape (n_clusters, n_features) Coordinates of cluster centers.
labels_ ndarray of shape (n_samples,) Cluster labels for each sample.
inertia_ float Sum of squared distances of samples to their closest cluster center.
n_iter_ int Number of bisection iterations performed.

Usage Examples

Basic Usage

from sklearn.cluster import BisectingKMeans
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

bisect_km = BisectingKMeans(n_clusters=2, random_state=0).fit(X)
print(bisect_km.labels_)
print(bisect_km.cluster_centers_)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment