Implementation:Rapidsai Cuml Cluster Predict
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Clustering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for assigning cluster labels to data using fitted KMeans, DBSCAN, and HDBSCAN models.
Description
Label assignment methods for the three clustering algorithms:
- KMeans.predict assigns new data points to the nearest cluster center by computing distances on GPU, returning integer labels.
- DBSCAN.fit_predict performs fitting and label assignment in a single call (DBSCAN does not support standalone predict for new data).
- HDBSCAN.fit_predict performs fitting and label assignment in a single call.
- approximate_predict provides approximate label assignment for new data points on a fitted HDBSCAN model using the condensed tree.
Usage
For KMeans, call `predict(X_new)` on a fitted model. For DBSCAN and HDBSCAN, use `fit_predict(X)` which fits and returns labels. For HDBSCAN new-data prediction, use `approximate_predict(fitted_model, X_new)`.
Code Reference
KMeans.predict
Source Location
- Repository: cuML
- File:
python/cuml/cuml/cluster/kmeans.pyx - Lines: 673-685
Signature
def predict(self, X, *, convert_dtype=True):
HDBSCAN.approximate_predict
Source Location
- Repository: cuML
- File:
python/cuml/cuml/cluster/hdbscan/hdbscan.pyx - Lines: 1282-1375
Signature
def approximate_predict(clusterer, points_to_predict, convert_dtype=True):
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like | Yes | Data matrix of shape (n_samples, n_features) for prediction. |
| convert_dtype | bool | No (default True) | Auto-convert to float32. |
| clusterer | HDBSCAN | Yes (approximate_predict) | Fitted HDBSCAN model with `prediction_data=True`. |
| points_to_predict | array-like | Yes (approximate_predict) | New points for approximate prediction. |
Outputs
| Name | Type | Description |
|---|---|---|
| labels | CumlArray | KMeans predict: cluster labels (n_samples,) int32. |
| labels | CumlArray | DBSCAN fit_predict: labels with -1 for noise points. |
| labels, probabilities | tuple | HDBSCAN approximate_predict: (labels, probabilities) tuple. |
Usage Examples
import cupy as cp
from cuml.cluster import KMeans, DBSCAN, HDBSCAN
from cuml.cluster.hdbscan import approximate_predict
X_train = cp.random.rand(5000, 20, dtype=cp.float32)
X_test = cp.random.rand(1000, 20, dtype=cp.float32)
# KMeans predict
kmeans = KMeans(n_clusters=5).fit(X_train)
labels = kmeans.predict(X_test)
# DBSCAN fit_predict (no separate predict)
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X_train)
# HDBSCAN approximate predict
hdbscan = HDBSCAN(min_cluster_size=25, prediction_data=True).fit(X_train)
labels, probs = approximate_predict(hdbscan, X_test)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment