Implementation:Rapidsai Cuml Cluster Predict

Knowledge Sources	cuML cuML Docs
Domains	Machine_Learning, Clustering
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for assigning cluster labels to data using fitted KMeans, DBSCAN, and HDBSCAN models.

Description

Label assignment methods for the three clustering algorithms:

KMeans.predict assigns new data points to the nearest cluster center by computing distances on GPU, returning integer labels.
DBSCAN.fit_predict performs fitting and label assignment in a single call (DBSCAN does not support standalone predict for new data).
HDBSCAN.fit_predict performs fitting and label assignment in a single call.
approximate_predict provides approximate label assignment for new data points on a fitted HDBSCAN model using the condensed tree.

Usage

For KMeans, call `predict(X_new)` on a fitted model. For DBSCAN and HDBSCAN, use `fit_predict(X)` which fits and returns labels. For HDBSCAN new-data prediction, use `approximate_predict(fitted_model, X_new)`.

Code Reference

KMeans.predict

Source Location

Repository: cuML
File: python/cuml/cuml/cluster/kmeans.pyx
Lines: 673-685

Signature

def predict(self, X, *, convert_dtype=True):

HDBSCAN.approximate_predict

Source Location

Repository: cuML
File: python/cuml/cuml/cluster/hdbscan/hdbscan.pyx
Lines: 1282-1375

Signature

def approximate_predict(clusterer, points_to_predict, convert_dtype=True):

I/O Contract

Inputs

Name	Type	Required	Description
X	array-like	Yes	Data matrix of shape (n_samples, n_features) for prediction.
convert_dtype	bool	No (default True)	Auto-convert to float32.
clusterer	HDBSCAN	Yes (approximate_predict)	Fitted HDBSCAN model with `prediction_data=True`.
points_to_predict	array-like	Yes (approximate_predict)	New points for approximate prediction.

Outputs

Name	Type	Description
labels	CumlArray	KMeans predict: cluster labels (n_samples,) int32.
labels	CumlArray	DBSCAN fit_predict: labels with -1 for noise points.
labels, probabilities	tuple	HDBSCAN approximate_predict: (labels, probabilities) tuple.

Usage Examples

import cupy as cp
from cuml.cluster import KMeans, DBSCAN, HDBSCAN
from cuml.cluster.hdbscan import approximate_predict

X_train = cp.random.rand(5000, 20, dtype=cp.float32)
X_test = cp.random.rand(1000, 20, dtype=cp.float32)

# KMeans predict
kmeans = KMeans(n_clusters=5).fit(X_train)
labels = kmeans.predict(X_test)

# DBSCAN fit_predict (no separate predict)
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X_train)

# HDBSCAN approximate predict
hdbscan = HDBSCAN(min_cluster_size=25, prediction_data=True).fit(X_train)
labels, probs = approximate_predict(hdbscan, X_test)

Related Pages

Implements Principle

Principle:Rapidsai_Cuml_Cluster_Label_Assignment

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment