Implementation:Scikit learn Scikit learn AgglomerativeClustering

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Clustering, Hierarchical Clustering
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for performing hierarchical agglomerative clustering provided by scikit-learn.

Description

AgglomerativeClustering recursively merges pairs of clusters from individual samples using a bottom-up approach. It supports multiple linkage criteria including ward, complete, average, and single linkage, and can use a variety of distance metrics. The algorithm can be constrained with a connectivity matrix to enforce structure and can use either a fixed number of clusters or a distance threshold as the stopping criterion.

Usage

Use AgglomerativeClustering when you need hierarchical clustering that produces a dendrogram-like tree of merges, when spatial constraints on clusters are important (via a connectivity matrix), or when you want to explore different levels of clustering granularity. It works well for small to medium datasets where the hierarchical structure is meaningful.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/cluster/_agglomerative.py

Signature

class AgglomerativeClustering(ClusterMixin, BaseEstimator):
    def __init__(
        self,
        n_clusters=2,
        *,
        metric="euclidean",
        memory=None,
        connectivity=None,
        compute_full_tree="auto",
        linkage="ward",
        distance_threshold=None,
        compute_distances=False,
    ):

Import

from sklearn.cluster import AgglomerativeClustering

I/O Contract

Inputs

Name	Type	Required	Description
n_clusters	int or None	No	The number of clusters to find. Must be None if distance_threshold is set. Default is 2.
metric	str or callable	No	Metric used to compute the linkage. Default is "euclidean".
memory	str or joblib.Memory	No	Used to cache computation of the tree. Default is None.
connectivity	array-like or callable	No	Connectivity matrix defining neighboring samples. Default is None.
compute_full_tree	"auto" or bool	No	Whether to compute the full tree. Default is "auto".
linkage	str	No	Linkage criterion: "ward", "complete", "average", or "single". Default is "ward".
distance_threshold	float or None	No	Distance threshold above which clusters will not be merged. Default is None.
compute_distances	bool	No	Whether to compute distances between clusters even when not needed. Default is False.

Outputs

Name	Type	Description
n_clusters_	int	The number of clusters found.
labels_	ndarray of shape (n_samples,)	Cluster labels for each sample.
n_leaves_	int	Number of leaves in the hierarchical tree.
n_connected_components_	int	Estimated number of connected components in the graph.
children_	ndarray of shape (n_nodes-1, 2)	Children of each non-leaf node.
distances_	ndarray of shape (n_nodes-1,)	Distances between clusters at each merge step (when compute_distances=True).

Usage Examples

Basic Usage

from sklearn.cluster import AgglomerativeClustering
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],
              [4, 2], [4, 4], [4, 0]])

clustering = AgglomerativeClustering(n_clusters=2, linkage="ward").fit(X)
print(clustering.labels_)

Related Pages

Principle:Scikit_learn_Scikit_learn_Clustering

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment