Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Rapidsai Cuml Cluster Predict

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Clustering
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for assigning cluster labels to data using fitted KMeans, DBSCAN, and HDBSCAN models.

Description

Label assignment methods for the three clustering algorithms:

  • KMeans.predict assigns new data points to the nearest cluster center by computing distances on GPU, returning integer labels.
  • DBSCAN.fit_predict performs fitting and label assignment in a single call (DBSCAN does not support standalone predict for new data).
  • HDBSCAN.fit_predict performs fitting and label assignment in a single call.
  • approximate_predict provides approximate label assignment for new data points on a fitted HDBSCAN model using the condensed tree.

Usage

For KMeans, call `predict(X_new)` on a fitted model. For DBSCAN and HDBSCAN, use `fit_predict(X)` which fits and returns labels. For HDBSCAN new-data prediction, use `approximate_predict(fitted_model, X_new)`.

Code Reference

KMeans.predict

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/cluster/kmeans.pyx
  • Lines: 673-685

Signature

def predict(self, X, *, convert_dtype=True):

HDBSCAN.approximate_predict

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/cluster/hdbscan/hdbscan.pyx
  • Lines: 1282-1375

Signature

def approximate_predict(clusterer, points_to_predict, convert_dtype=True):

I/O Contract

Inputs

Name Type Required Description
X array-like Yes Data matrix of shape (n_samples, n_features) for prediction.
convert_dtype bool No (default True) Auto-convert to float32.
clusterer HDBSCAN Yes (approximate_predict) Fitted HDBSCAN model with `prediction_data=True`.
points_to_predict array-like Yes (approximate_predict) New points for approximate prediction.

Outputs

Name Type Description
labels CumlArray KMeans predict: cluster labels (n_samples,) int32.
labels CumlArray DBSCAN fit_predict: labels with -1 for noise points.
labels, probabilities tuple HDBSCAN approximate_predict: (labels, probabilities) tuple.

Usage Examples

import cupy as cp
from cuml.cluster import KMeans, DBSCAN, HDBSCAN
from cuml.cluster.hdbscan import approximate_predict

X_train = cp.random.rand(5000, 20, dtype=cp.float32)
X_test = cp.random.rand(1000, 20, dtype=cp.float32)

# KMeans predict
kmeans = KMeans(n_clusters=5).fit(X_train)
labels = kmeans.predict(X_test)

# DBSCAN fit_predict (no separate predict)
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X_train)

# HDBSCAN approximate predict
hdbscan = HDBSCAN(min_cluster_size=25, prediction_data=True).fit(X_train)
labels, probs = approximate_predict(hdbscan, X_test)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment