Implementation:Rapidsai Cuml KMeans DBSCAN HDBSCAN Fit

Knowledge Sources	cuML cuML Docs
Domains	Machine_Learning, Clustering, GPU_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for fitting KMeans, DBSCAN, and HDBSCAN clustering models on GPU using the cuML library.

Description

These `fit()` methods execute the core clustering algorithms on GPU:

KMeans.fit runs Lloyd's iterative algorithm on GPU via RAFT C++ kernels, computing cluster centers by alternating between assignment and centroid update until convergence or max iterations.
DBSCAN.fit computes pairwise distances in batches (controlled by `max_mbytes_per_batch`), identifies core samples, and expands clusters via GPU-parallel neighborhood queries.
HDBSCAN.fit builds a KNN graph (brute force or NN Descent), constructs the mutual reachability graph, builds the minimum spanning tree, extracts the condensed tree, and selects clusters via EOM or leaf method.

Usage

Call `estimator.fit(X)` after constructing and configuring the clustering estimator. The input data X should be a 2D array-like of shape (n_samples, n_features).

Code Reference

KMeans.fit

Source Location

Repository: cuML
File: python/cuml/cuml/cluster/kmeans.pyx
Lines: 512-592

Signature

def fit(self, X, y=None, sample_weight=None, *, convert_dtype=True):

DBSCAN.fit

Source Location

Repository: cuML
File: python/cuml/cuml/cluster/dbscan.pyx
Lines: 301-487

Signature

def fit(self, X, y=None, sample_weight=None, *, out_dtype='int32', convert_dtype=True):

HDBSCAN.fit

Source Location

Repository: cuML
File: python/cuml/cuml/cluster/hdbscan/hdbscan.pyx
Lines: 918-1068

Signature

def fit(self, X, y=None, *, convert_dtype=True):

I/O Contract

Inputs

Name	Type	Required	Description
X	array-like	Yes	Feature matrix of shape (n_samples, n_features), float32 or float64.
y	array-like	No (ignored)	Not used. Present for API compatibility.
sample_weight	array-like	No (KMeans/DBSCAN only)	Per-sample weights for weighted clustering.
convert_dtype	bool	No (default True)	Auto-convert input to float32 if needed.
out_dtype	str	No (DBSCAN only, default 'int32')	Label dtype: 'int32' or 'int64'.

Outputs

Name	Type	Description
self	estimator	Returns fitted estimator with computed attributes.
KMeans attributes	—	`cluster_centers_`, `labels_`, `inertia_`, `n_iter_`
DBSCAN attributes	—	`labels_`, `core_sample_indices_`, `components_`
HDBSCAN attributes	—	`labels_`, `probabilities_`, `cluster_persistence_`, `n_clusters_`

Usage Examples

import cupy as cp
from cuml.cluster import KMeans, DBSCAN, HDBSCAN

X = cp.random.rand(10000, 50, dtype=cp.float32)

# KMeans fitting
kmeans = KMeans(n_clusters=10)
kmeans.fit(X)
print(kmeans.inertia_, kmeans.n_iter_)

# DBSCAN fitting
dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan.fit(X)
print(dbscan.labels_)

# HDBSCAN fitting
hdbscan = HDBSCAN(min_cluster_size=25)
hdbscan.fit(X)
print(hdbscan.n_clusters_, hdbscan.probabilities_)

Related Pages

Implements Principle

Principle:Rapidsai_Cuml_Cluster_Model_Fitting

Requires Environment

Uses Heuristic

Heuristic:Rapidsai_Cuml_Batch_Size_Memory_Tradeoff

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment