Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn AgglomerativeClustering

From Leeroopedia
Revision as of 16:33, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Scikit_learn_Scikit_learn_AgglomerativeClustering.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Clustering, Hierarchical Clustering
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for performing hierarchical agglomerative clustering provided by scikit-learn.

Description

AgglomerativeClustering recursively merges pairs of clusters from individual samples using a bottom-up approach. It supports multiple linkage criteria including ward, complete, average, and single linkage, and can use a variety of distance metrics. The algorithm can be constrained with a connectivity matrix to enforce structure and can use either a fixed number of clusters or a distance threshold as the stopping criterion.

Usage

Use AgglomerativeClustering when you need hierarchical clustering that produces a dendrogram-like tree of merges, when spatial constraints on clusters are important (via a connectivity matrix), or when you want to explore different levels of clustering granularity. It works well for small to medium datasets where the hierarchical structure is meaningful.

Code Reference

Source Location

Signature

class AgglomerativeClustering(ClusterMixin, BaseEstimator):
    def __init__(
        self,
        n_clusters=2,
        *,
        metric="euclidean",
        memory=None,
        connectivity=None,
        compute_full_tree="auto",
        linkage="ward",
        distance_threshold=None,
        compute_distances=False,
    ):

Import

from sklearn.cluster import AgglomerativeClustering

I/O Contract

Inputs

Name Type Required Description
n_clusters int or None No The number of clusters to find. Must be None if distance_threshold is set. Default is 2.
metric str or callable No Metric used to compute the linkage. Default is "euclidean".
memory str or joblib.Memory No Used to cache computation of the tree. Default is None.
connectivity array-like or callable No Connectivity matrix defining neighboring samples. Default is None.
compute_full_tree "auto" or bool No Whether to compute the full tree. Default is "auto".
linkage str No Linkage criterion: "ward", "complete", "average", or "single". Default is "ward".
distance_threshold float or None No Distance threshold above which clusters will not be merged. Default is None.
compute_distances bool No Whether to compute distances between clusters even when not needed. Default is False.

Outputs

Name Type Description
n_clusters_ int The number of clusters found.
labels_ ndarray of shape (n_samples,) Cluster labels for each sample.
n_leaves_ int Number of leaves in the hierarchical tree.
n_connected_components_ int Estimated number of connected components in the graph.
children_ ndarray of shape (n_nodes-1, 2) Children of each non-leaf node.
distances_ ndarray of shape (n_nodes-1,) Distances between clusters at each merge step (when compute_distances=True).

Usage Examples

Basic Usage

from sklearn.cluster import AgglomerativeClustering
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],
              [4, 2], [4, 4], [4, 0]])

clustering = AgglomerativeClustering(n_clusters=2, linkage="ward").fit(X)
print(clustering.labels_)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment