Implementation:Scikit learn Scikit learn AgglomerativeClustering
| Knowledge Sources | |
|---|---|
| Domains | Clustering, Hierarchical Clustering |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for performing hierarchical agglomerative clustering provided by scikit-learn.
Description
AgglomerativeClustering recursively merges pairs of clusters from individual samples using a bottom-up approach. It supports multiple linkage criteria including ward, complete, average, and single linkage, and can use a variety of distance metrics. The algorithm can be constrained with a connectivity matrix to enforce structure and can use either a fixed number of clusters or a distance threshold as the stopping criterion.
Usage
Use AgglomerativeClustering when you need hierarchical clustering that produces a dendrogram-like tree of merges, when spatial constraints on clusters are important (via a connectivity matrix), or when you want to explore different levels of clustering granularity. It works well for small to medium datasets where the hierarchical structure is meaningful.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/cluster/_agglomerative.py
Signature
class AgglomerativeClustering(ClusterMixin, BaseEstimator):
def __init__(
self,
n_clusters=2,
*,
metric="euclidean",
memory=None,
connectivity=None,
compute_full_tree="auto",
linkage="ward",
distance_threshold=None,
compute_distances=False,
):
Import
from sklearn.cluster import AgglomerativeClustering
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_clusters | int or None | No | The number of clusters to find. Must be None if distance_threshold is set. Default is 2. |
| metric | str or callable | No | Metric used to compute the linkage. Default is "euclidean". |
| memory | str or joblib.Memory | No | Used to cache computation of the tree. Default is None. |
| connectivity | array-like or callable | No | Connectivity matrix defining neighboring samples. Default is None. |
| compute_full_tree | "auto" or bool | No | Whether to compute the full tree. Default is "auto". |
| linkage | str | No | Linkage criterion: "ward", "complete", "average", or "single". Default is "ward". |
| distance_threshold | float or None | No | Distance threshold above which clusters will not be merged. Default is None. |
| compute_distances | bool | No | Whether to compute distances between clusters even when not needed. Default is False. |
Outputs
| Name | Type | Description |
|---|---|---|
| n_clusters_ | int | The number of clusters found. |
| labels_ | ndarray of shape (n_samples,) | Cluster labels for each sample. |
| n_leaves_ | int | Number of leaves in the hierarchical tree. |
| n_connected_components_ | int | Estimated number of connected components in the graph. |
| children_ | ndarray of shape (n_nodes-1, 2) | Children of each non-leaf node. |
| distances_ | ndarray of shape (n_nodes-1,) | Distances between clusters at each merge step (when compute_distances=True). |
Usage Examples
Basic Usage
from sklearn.cluster import AgglomerativeClustering
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
clustering = AgglomerativeClustering(n_clusters=2, linkage="ward").fit(X)
print(clustering.labels_)